1Edgar GabrielCOSC 6385Computer ArchitectureIntroduction and Organizational IssuesEdgar GabrielFall 2009COSC 6385 – Computer ArchitectureEdgar GabrielOrganizational issues (I) • Classes:– Monday, 1.00pm – 2.30pm, SEC 202 – Wednesday, 1.00pm – 2.30pm, SEC 202• Evaluation– 25% homework – 75% three quizzes ( 25% each)• In case of questions:– email: [email protected]– Tel: (713) 743 3358– Office hours: PGH 524, Tue, 11am-12pm or by appointment• All slides available on the website: – http://www.cs.uh.edu/~gabriel/cosc6385_f09/– Videos of some lectures will be posted on the course web page2COSC 6385 – Computer ArchitectureEdgar GabrielOrganizational Issues (II)• TA’s for the course:– Sarat Poluri, PGH 526, [email protected]– Anup Prakash, PGH 526, [email protected]• Tentative dates for the quizzes:– Monday, September 21st– Wednesday, October 21st– Wednesday, December 2nd• Homework – Announced on Monday, September 28– Due on Wednesday, October 14COSC 6385 – Computer ArchitectureEdgar GabrielContents• Textbook:John L. Hennessy, David A. Patterson“Computer Architecture –A Quantitative Approach”4thEditionMorgan Kaufmann Publishers3COSC 6385 – Computer ArchitectureEdgar GabrielContents (II)• Most of chapters 1 to 5• Appendix A, B, C• Selected sections regarding – Storage systems– Vector Processors• Selected literature to multi-core processors• Selected literature to virtualizationCOSC 6385 – Computer ArchitectureEdgar GabrielContents(III)Aug. 24 Overview, Motivation, OrganizationAug. 26 Performance MeasurementAug 31 Instruction Set ArchitecturesSep. 2 Memory Hierarchy (I)Sep. 7 Labor Day, no lecturesSep. 9 Memory Hierarchy (II) (online)Sep. 14 Pipelining (I), Sep. 16 Recap for 1st quiz, Sep. 21 1st quiz Sep. 23 Pipelining (II) (online)Sep. 28 homework announcementSep. 30 Tomasulo's algorithm (I)Oct. 5 Tomasulo's algorithm (II)Oct. 7 ILP with software approaches Oct. 12 discussion of 1st quiz; Oct. 14 recap for 2nd quiz; homework due Oct. 19 Vector processorsOct. 21 2nd quizOct. 26 Multi-processor systems (I) Oct. 28 Multi-processor systems (II) Nov. 2 Multi-processor systems (III) Nov. 4 discussion of 2nd quiz Nov. 9 Multi-processor systems (IV)Nov. 11 VirtualizationNov. 16 File I/ONov. 18 cancelled?Nov. 23 recap for 3rd quizNov. 25 Thanksgiving holiday , no classNov. 30 History of ComputersDec. 2 3rd Quiz4COSC 6385 – Computer ArchitectureEdgar GabrielCOSC 6385 – Computer ArchitectureEdgar GabrielWhy learning about Computer Architecture?• Every loop iteration requires 3 memory operations– 2 loads– 1 store• For a micro-processor having a frequency of 2 GHz this loop requiresto satisfy one Floating Point Unit (FPU) • Most modern processors have 2 FPUs and two or more Integer Units which could work in parallel for (i=0; i<n; i++ ) {c[i] = a[i] + b[i];}sGBytessBytes /2410*2*4*319=−5COSC 6385 – Computer ArchitectureEdgar GabrielMemory technology (www.kingston.com/newtech)• Bandwidth of a memory modulewithCycleOpfSBSBBUSBus/**max=maxSBBUSSBBUSf: max. memory bandwidth: Bandwidth of the memory bus (64 Bit = 8 Bytes): Frequency of the memory bus COSC 6385 – Computer ArchitectureEdgar GabrielMemory bandwidthName Frequency of memory bus (MHz)max. bandwidthPC100 SDRAM 100 800 MB/sPC133 SDRAM 133 1.1 GB/sPC1600 DDR 100 1.6 GB/sPC2100 DDR 133 2.1 GB/sPC2700 DDR 166 2.7 GB/sPC3200 DDR 200 3.2 GB/sPC3700 DDR 233 3.7 GB/sPC4200 DDR 266 4.2 GB/s6COSC 6385 – Computer ArchitectureEdgar GabrielMemory modules (cont.)• Dual Channel Memory: 2 I/O Channels between memory controller und memory module• DDR2 and DDR3: further evolution of the DDR technologyName Frequency of memory busBandwidth of a moduleDual Channel DDR2 bandwidthPC2-3200 400 MHz 3.2 GB/s 6.4 GB/sPC2-4200 533 MHz 4.2 GB/s 8.4 GB/sPC2-5300 667 MHz 5.3 GB/s 10.6 GB/sPC2-6400 800 MHz 6.4 GB/s 12.8 GB/sPC3-8500 1066 MHz 8.5GB/s 17.0 GB/sPC3-10600 1333 MHz 10.6 GB/s 21.2 GB/sPC3-12800 1600 MHz 12.8 GB/s 25.6 GB/sCOSC 6385 – Computer ArchitectureEdgar GabrielMemory hierarchiesSize Access time[cycles]Backup (tape) TB, PTPrimary data storage (disk)~ 100 GB > 106main memory ~ 1-4 GB 100 - 1000Caches ~ 1-4 MB 2 – 50Register < 256 Words 1 - 27COSC 6385 – Computer ArchitectureEdgar GabrielMemory hierarchies • Do I have to care about memory hierarchies?• Example: Matrix-multiply of two dense matrices– “Trivial” codefor ( i=0; i<dim; i++ ) {for ( j=0; j<dim; j++ ) {for ( k=0; k<dim; k++) {c[i][j] += a[i][k] * b[k][j];}}}COSC 6385 – Computer ArchitectureEdgar GabrielMatrix-multiply• Performance of the trivial implementation on an 2.2 GHz AMD Opteron with 2 GB main memory 1 MB 2ndlevel cacheMatrix dimension Execution time [sec]Performance [MFLOPS]256x256 0.118 284512x512 2.05 1308COSC 6385 – Computer ArchitectureEdgar GabrielMatrix-multiply (II)• Peak floating point performance of the processor2 * (2.2 * 109) Floating point operations/sec = 4.4 * 109= 4.4 GFLOPS• Where are the missing FLOPS between theoretical peek and achieved performance?– Memory wait timeNumber of floating point unitsFrequency of the processor→ assuming that each FPU can finish an operation per cycleTheoretical floating point peakperformance of the processor COSC 6385 – Computer ArchitectureEdgar GabrielBlocked codefor ( i=0; i<dim; i+=block ) {for ( j=0; j<dim; j+=block ) {for ( k=0; k<dim; k+=block) {for (ii=i; ii<(i+block); ii++) {for (jj=j; jj<(j+block); jj++) {for (kk=k; kk<(k+block);kk++) {c[ii][jj] += a[ii][kk] * b[kk][jj];}}}}}}9COSC 6385 – Computer ArchitectureEdgar GabrielPerformance of the blocked codeMatrix dimensionblock Execution time[sec]Performance[MFLOPS]“trivial” [MFLOPS]256x256 4 0.065 513 2848 0.046 72616 0.51 65732 0.043 77764 0.049 677128 0.113 296512x512 4 0.686 391 1308 0.422 63516 0.447 59932 0.501 53564 1.00 266128 0.994 269COSC 6385 – Computer ArchitectureEdgar Gabriel10COSC 6385 – Computer ArchitectureEdgar GabrielCOSC 6385 – Computer ArchitectureEdgar Gabriel11COSC 6385 – Computer ArchitectureEdgar GabrielTop 500 List (www.top500.org)COSC 6385 – Computer ArchitectureEdgar GabrielTop 500 List12COSC 6385 – Computer ArchitectureEdgar GabrielIBM Roadrunner• First computer to surpass the 1 Petaflop (250 FLOPS ) barrier• Installed at Los Alamos National Laboratory• Hybrid Architecture• 13,824 AMD Opteron cores• 116,640 IBMPowerXCell 8i cores• Costs: $120 millionCOSC 6385 – Computer
View Full Document