1Edgar GabrielCOSC 6385Computer ArchitectureIntroduction and Organizational IssuesEdgar GabrielSpring 2012COSC 6385 – Computer ArchitectureEdgar GabrielOrganizational issues (I) • Classes:– Tuesday, 11.30am – 1.00pm, H 30– Thursday, 1.30am – 1.00pm, H 30• Evaluation as planned right now– 1 homework: 25%– 3 quizzes: 75% (25% each)• About pop-up quizzes:– Not planned right now. However, can be installed any time during the semester depending on participation of students• unannounced, unspecified number of mini quizzes• at the beginning of a lecture, 10 minutes only• open book, focusing on last lecture only2COSC 6385 – Computer ArchitectureEdgar GabrielOrganizational issues (II) • In case of questions:– email: [email protected]– Tel: (713) 743 3358– Office hours: PGH 524, Monday, 11am-11.45am or by appointment• All slides available on the website: – http://www.cs.uh.edu/~gabriel/cosc6385_s12/– Videos of some lectures will be posted on the course web pageCOSC 6385 – Computer ArchitectureEdgar GabrielOrganizational Issues (III)• TA’s for the course:– Kshitij Mehta, PGH 526, [email protected]• Tentative dates for the quizzes:– 1stquiz: Tuesday, Feb 21– 2ndquiz: Thursday, March 27– 3rdquiz: Thursday, April 26 • Homework – Announced: Thursday, Feb 23– Due on: Friday, March 93COSC 6385 – Computer ArchitectureEdgar GabrielContents• Textbook:John L. Hennessy, David A. Patterson“Computer Architecture –A Quantitative Approach”4thEditionMorgan Kaufmann PublishersCOSC 6385 – Computer ArchitectureEdgar GabrielContents (II)• Most of chapters 1 to 5• Appendix A, B, C• Selected sections regarding – Storage systems– Vector Processors• Selected literature to multi-core processors• Selected literature to virtualization4COSC 6385 – Computer ArchitectureEdgar GabrielContents(III)Jan. 17 Overview, Motivation, OrganizationJan. 19 Performance MeasurementJan 24 Instruction Set ArchitecturesJan 26 Memory Hierarchy (I) Jan 31 Memory Hierarchy (II) Feb. 2 Pipelining (I) Feb. 7 Pipelining (II)Feb. 14 Recap for 1st quiz, exercises (I)Feb. 16 Tomasulo's algorithm (I)Feb. 21 1stquizFeb. 23 homework announcementFeb. 28 Tomasulo's algorithm (IIMar. 1 discussion of midterm quiz; Mar. 6 ILP with software approaches Mar. 8 Vector processorsMar. 13 and 15: spring break, no lectureMar. 20 Exercises (II); homework discussionMar. 22 Multi-processor systems (I): Flynn's taxonomy, cache coherence protocolsMar 27 2nd quizMar 29 Multi-processor systems (II): Distr. shared memory and directory protocolsApr. 3 Multi-processor systems (III): synchronization problemsApr. 5 Multi-processor systems (IV): multi-core and multi-threading Apr. 10 Multi-processor systems (V):Nvidia GPUApr. 12 VirtualizationApr. 17 File I/OApr. 19 recap for final quizApr. 24 History of ComputersMay. 26 Final QuizCOSC 6385 – Computer ArchitectureEdgar GabrielWhy learning about Computer Architecture?• Every loop iteration requires 3 memory operations– 2 loads– 1 store• For a micro-processor having a frequency of 2 GHz this loop requiresto satisfy one Floating Point Unit (FPU) • Most modern processors have 2 FPUs and two or more Integer Units which could work in parallel for (i=0; i<n; i++ ) {c[i] = a[i] + b[i];}sGBytessBytes /2410*2*4*319=−5COSC 6385 – Computer ArchitectureEdgar GabrielMemory technology (www.kingston.com/newtech)• Memory Bandwidth withCycleOpfSBSBBUSBus/**max=maxSBBUSSBBUSf: max. memory bandwidth: Bandwidth of the memory bus (64 Bit = 8 Bytes): Frequency of the memory bus COSC 6385 – Computer ArchitectureEdgar GabrielMemory bandwidthName Frequency of memory bus (MHz)max. bandwidthPC100 SDRAM 100 800 MB/sPC133 SDRAM 133 1.1 GB/sPC1600 DDR 100 1.6 GB/sPC2100 DDR 133 2.1 GB/sPC2700 DDR 166 2.7 GB/sPC3200 DDR 200 3.2 GB/sPC3700 DDR 233 3.7 GB/sPC4200 DDR 266 4.2 GB/s6COSC 6385 – Computer ArchitectureEdgar GabrielMemory modules (cont.)• Dual Channel Memory: 2 I/O Channels between memory controller und memory module• DDR2 and DDR3: further evolution of the DDR technologyName Frequency of memory busBandwidth of a moduleDual Channel DDR2 bandwidthPC2-3200 400 MHz 3.2 GB/s 6.4 GB/sPC2-4200 533 MHz 4.2 GB/s 8.4 GB/sPC2-5300 667 MHz 5.3 GB/s 10.6 GB/sPC2-6400 800 MHz 6.4 GB/s 12.8 GB/sPC3-8500 1066 MHz 8.5GB/s 17.0 GB/sPC3-10600 1333 MHz 10.6 GB/s 21.2 GB/sPC3-12800 1600 MHz 12.8 GB/s 25.6 GB/sCOSC 6385 – Computer ArchitectureEdgar GabrielMemory hierarchiesSize Access time[cycles]Backup (tape) TB, PTPrimary data storage (disk)~ 100 GB > 106main memory ~ 1-4 GB 100 - 1000Caches ~ 1-4 MB 2 – 50Register < 256 Words 1 - 27COSC 6385 – Computer ArchitectureEdgar GabrielMemory hierarchies • Do I have to care about memory hierarchies?• Example: Matrix-multiply of two dense matrices– “Trivial” codefor ( i=0; i<dim; i++ ) {for ( j=0; j<dim; j++ ) {for ( k=0; k<dim; k++) {c[i][j] += a[i][k] * b[k][j];}}}COSC 6385 – Computer ArchitectureEdgar GabrielMatrix-multiply• Performance of the trivial implementation on an 2.2 GHz AMD Opteron with 2 GB main memory 1 MB 2ndlevel cacheMatrix dimension Execution time [sec]Performance [MFLOPS]256x256 0.118 284512x512 2.05 1308COSC 6385 – Computer ArchitectureEdgar GabrielMatrix-multiply (II)• Peak floating point performance of the processor2 * (2.2 * 109) Floating point operations/sec = 4.4 * 109= 4.4 GFLOPS• Where are the missing FLOPS between theoretical peek and achieved performance?– Memory wait timeNumber of floating point unitsFrequency of the processor→ assuming that each FPU can finish an operation per cycleTheoretical floating point peakperformance of the processor COSC 6385 – Computer ArchitectureEdgar GabrielBlocked codefor ( i=0; i<dim; i+=block ) {for ( j=0; j<dim; j+=block ) {for ( k=0; k<dim; k+=block) {for (ii=i; ii<(i+block); ii++) {for (jj=j; jj<(j+block); jj++) {for (kk=k; kk<(k+block);kk++) {c[ii][jj] += a[ii][kk] * b[kk][jj];}}}}}}9COSC 6385 – Computer ArchitectureEdgar GabrielPerformance of the blocked codeMatrix dimensionblock Execution time[sec]Performance[MFLOPS]“trivial” [MFLOPS]256x256 4 0.065 513 2848 0.046 72616 0.51 65732 0.043 77764 0.049 677128 0.113 296512x512 4 0.686 391 1308 0.422 63516 0.447 59932 0.501 53564 1.00 266128 0.994 269COSC 6385 – Computer ArchitectureEdgar Gabriel10COSC 6385 – Computer ArchitectureEdgar GabrielCOSC 6385 – Computer ArchitectureEdgar GabrielTop 500 List (www.top500.org)11COSC 6385 –
View Full Document