UH COSC 6385 - COSC 6385 Introduction and Organizational Issues

Unformatted text preview:

1Edgar GabrielCOSC 6385Computer ArchitectureIntroduction and Organizational IssuesEdgar GabrielFall 2009COSC 6385 – Computer ArchitectureEdgar GabrielOrganizational issues (I) • Classes:– Monday, 1.00pm – 2.30pm, SEC 202 – Wednesday, 1.00pm – 2.30pm, SEC 202• Evaluation– 25% homework – 75% three quizzes ( 25% each)• In case of questions:– email: [email protected]– Tel: (713) 743 3358– Office hours: PGH 524, Tue, 11am-12pm or by appointment• All slides available on the website: – http://www.cs.uh.edu/~gabriel/cosc6385_f09/– Videos of some lectures will be posted on the course web page2COSC 6385 – Computer ArchitectureEdgar GabrielOrganizational Issues (II)• TA’s for the course:– Sarat Poluri, PGH 526, [email protected]– Anup Prakash, PGH 526, [email protected]• Tentative dates for the quizzes:– Monday, September 21st– Wednesday, October 21st– Wednesday, December 2nd• Homework – Announced on Monday, September 28– Due on Wednesday, October 14COSC 6385 – Computer ArchitectureEdgar GabrielContents• Textbook:John L. Hennessy, David A. Patterson“Computer Architecture –A Quantitative Approach”4thEditionMorgan Kaufmann Publishers3COSC 6385 – Computer ArchitectureEdgar GabrielContents (II)• Most of chapters 1 to 5• Appendix A, B, C• Selected sections regarding – Storage systems– Vector Processors• Selected literature to multi-core processors• Selected literature to virtualizationCOSC 6385 – Computer ArchitectureEdgar GabrielContents(III)Aug. 24 Overview, Motivation, OrganizationAug. 26 Performance MeasurementAug 31 Instruction Set ArchitecturesSep. 2 Memory Hierarchy (I)Sep. 7 Labor Day, no lecturesSep. 9 Memory Hierarchy (II) (online)Sep. 14 Pipelining (I), Sep. 16 Recap for 1st quiz, Sep. 21 1st quiz Sep. 23 Pipelining (II) (online)Sep. 28 homework announcementSep. 30 Tomasulo's algorithm (I)Oct. 5 Tomasulo's algorithm (II)Oct. 7 ILP with software approaches Oct. 12 discussion of 1st quiz; Oct. 14 recap for 2nd quiz; homework due Oct. 19 Vector processorsOct. 21 2nd quizOct. 26 Multi-processor systems (I) Oct. 28 Multi-processor systems (II) Nov. 2 Multi-processor systems (III) Nov. 4 discussion of 2nd quiz Nov. 9 Multi-processor systems (IV)Nov. 11 VirtualizationNov. 16 File I/ONov. 18 cancelled?Nov. 23 recap for 3rd quizNov. 25 Thanksgiving holiday , no classNov. 30 History of ComputersDec. 2 3rd Quiz4COSC 6385 – Computer ArchitectureEdgar GabrielCOSC 6385 – Computer ArchitectureEdgar GabrielWhy learning about Computer Architecture?• Every loop iteration requires 3 memory operations– 2 loads– 1 store• For a micro-processor having a frequency of 2 GHz this loop requiresto satisfy one Floating Point Unit (FPU) • Most modern processors have 2 FPUs and two or more Integer Units which could work in parallel for (i=0; i<n; i++ ) {c[i] = a[i] + b[i];}sGBytessBytes /2410*2*4*319=−5COSC 6385 – Computer ArchitectureEdgar GabrielMemory technology (www.kingston.com/newtech)• Bandwidth of a memory modulewithCycleOpfSBSBBUSBus/**max=maxSBBUSSBBUSf: max. memory bandwidth: Bandwidth of the memory bus (64 Bit = 8 Bytes): Frequency of the memory bus COSC 6385 – Computer ArchitectureEdgar GabrielMemory bandwidthName Frequency of memory bus (MHz)max. bandwidthPC100 SDRAM 100 800 MB/sPC133 SDRAM 133 1.1 GB/sPC1600 DDR 100 1.6 GB/sPC2100 DDR 133 2.1 GB/sPC2700 DDR 166 2.7 GB/sPC3200 DDR 200 3.2 GB/sPC3700 DDR 233 3.7 GB/sPC4200 DDR 266 4.2 GB/s6COSC 6385 – Computer ArchitectureEdgar GabrielMemory modules (cont.)• Dual Channel Memory: 2 I/O Channels between memory controller und memory module• DDR2 and DDR3: further evolution of the DDR technologyName Frequency of memory busBandwidth of a moduleDual Channel DDR2 bandwidthPC2-3200 400 MHz 3.2 GB/s 6.4 GB/sPC2-4200 533 MHz 4.2 GB/s 8.4 GB/sPC2-5300 667 MHz 5.3 GB/s 10.6 GB/sPC2-6400 800 MHz 6.4 GB/s 12.8 GB/sPC3-8500 1066 MHz 8.5GB/s 17.0 GB/sPC3-10600 1333 MHz 10.6 GB/s 21.2 GB/sPC3-12800 1600 MHz 12.8 GB/s 25.6 GB/sCOSC 6385 – Computer ArchitectureEdgar GabrielMemory hierarchiesSize Access time[cycles]Backup (tape) TB, PTPrimary data storage (disk)~ 100 GB > 106main memory ~ 1-4 GB 100 - 1000Caches ~ 1-4 MB 2 – 50Register < 256 Words 1 - 27COSC 6385 – Computer ArchitectureEdgar GabrielMemory hierarchies • Do I have to care about memory hierarchies?• Example: Matrix-multiply of two dense matrices– “Trivial” codefor ( i=0; i<dim; i++ ) {for ( j=0; j<dim; j++ ) {for ( k=0; k<dim; k++) {c[i][j] += a[i][k] * b[k][j];}}}COSC 6385 – Computer ArchitectureEdgar GabrielMatrix-multiply• Performance of the trivial implementation on an 2.2 GHz AMD Opteron with 2 GB main memory 1 MB 2ndlevel cacheMatrix dimension Execution time [sec]Performance [MFLOPS]256x256 0.118 284512x512 2.05 1308COSC 6385 – Computer ArchitectureEdgar GabrielMatrix-multiply (II)• Peak floating point performance of the processor2 * (2.2 * 109) Floating point operations/sec = 4.4 * 109= 4.4 GFLOPS• Where are the missing FLOPS between theoretical peek and achieved performance?– Memory wait timeNumber of floating point unitsFrequency of the processor→ assuming that each FPU can finish an operation per cycleTheoretical floating point peakperformance of the processor COSC 6385 – Computer ArchitectureEdgar GabrielBlocked codefor ( i=0; i<dim; i+=block ) {for ( j=0; j<dim; j+=block ) {for ( k=0; k<dim; k+=block) {for (ii=i; ii<(i+block); ii++) {for (jj=j; jj<(j+block); jj++) {for (kk=k; kk<(k+block);kk++) {c[ii][jj] += a[ii][kk] * b[kk][jj];}}}}}}9COSC 6385 – Computer ArchitectureEdgar GabrielPerformance of the blocked codeMatrix dimensionblock Execution time[sec]Performance[MFLOPS]“trivial” [MFLOPS]256x256 4 0.065 513 2848 0.046 72616 0.51 65732 0.043 77764 0.049 677128 0.113 296512x512 4 0.686 391 1308 0.422 63516 0.447 59932 0.501 53564 1.00 266128 0.994 269COSC 6385 – Computer ArchitectureEdgar Gabriel10COSC 6385 – Computer ArchitectureEdgar GabrielCOSC 6385 – Computer ArchitectureEdgar Gabriel11COSC 6385 – Computer ArchitectureEdgar GabrielTop 500 List (www.top500.org)COSC 6385 – Computer ArchitectureEdgar GabrielTop 500 List12COSC 6385 – Computer ArchitectureEdgar GabrielIBM Roadrunner• First computer to surpass the 1 Petaflop (250 FLOPS ) barrier• Installed at Los Alamos National Laboratory• Hybrid Architecture• 13,824 AMD Opteron cores• 116,640 IBMPowerXCell 8i cores• Costs: $120 millionCOSC 6385 – Computer


View Full Document

UH COSC 6385 - COSC 6385 Introduction and Organizational Issues

Download COSC 6385 Introduction and Organizational Issues
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view COSC 6385 Introduction and Organizational Issues and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view COSC 6385 Introduction and Organizational Issues 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?