Unformatted text preview:

Principles of Computer Architecture Miles Murdocca and Vincent Heuring Chapter 10: Trends in Computer ArchitectureChapter ContentsInstruction FrequencyComplexity of AssignmentsSpeedup and EfficiencyExampleFour-Stage Instruction PipelinePipeline BehaviorFilling the Load Delay SlotCall-Return BehaviorSPARC RegistersOverlapping Register WindowsExample: Compiled C Programgcc Generated SPARC Codegcc Generated SPARC Code (cont’)Effect of Compiler OptimizationThe PowerPC 601 Architecture128-Bit IA-64 Instruction WordParallel Speedup and Amdahl’s LawEfficiency and ThroughputFlynn TaxonomyNetwork TopologiesCrossbarCrosspoint SettingsThree-Stage Clos Network12-Channel Three-Stage Clos Network with n = p = 612-Channel Three-Stage Clos Network with n = p = 212-Channel Three-Stage Clos Network with n = p = 412-Channel Three-Stage Clos Network with n = p = 3C function computes (x2 + y2) ´ y2Dependency GraphMatrix MultiplicationMatrix Multiplication Dependency GraphThe Connection Machine CM-1CM-1 Router NetworkCM-1 Processing ElementThe Connection Machine CM-5Partitions on the CM-5Fat TreeParallel Processing in Sega GenesisSega Genesis Architecture10-1Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationPrinciples of Computer ArchitectureMiles Murdocca and Vincent HeuringChapter 10: Trends in Computer Architecture10-2Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationChapter Contents10.1 Quantitative Analyses of Program Execution10.2 From CISC to RISC10.3 Pipelining the Datapath10.4 Overlapping Register Windows10.5 Multiple Instruction Issue (Superscalar) Machines – The PowerPC10.6 Case Study: The PowerPC™ 601 as a Superscalar Architecture10.7 VLIW Machines10.8 Case Study: The Intel IA-64 (Merced) Architecture10.9 Parallel Architecture10.10 Case Study: Parallel Processing in the Sega Genesis10-3Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationInstruction Frequency• Frequency of occurrence of instruction types for a variety of languages. The percentages do not sum to 100 due to roundoff. (Adapted from Knuth, D. E., An Empirical Study of FORTRAN Programs, Software—Practice and Experience, 1, 105-133, 1971.)10-4Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationComplexity of Assignments• Percentages showing complexity of assignments and procedure calls. (Adapted from Tanenbaum, A., Structured Computer Organization, 4/e, Prentice Hall, Upper Saddle River, New Jersey, 1999.)10-5Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationSpeedup and Efficiency• Speedup S is the ratio of the time needed to execute a program without an enhancement to the time required with an enhancement.• Time T is computed as the instruction count IC times the number of cycles per instruction CPI times the cycle time .• Substituting T into the speedup percentage calculation above yields:10-6Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationExample• Example: Estimate the speedup obtained by replacing a CPU having an average CPI of 5 with another CPU having an average CPI of 3.5, with the clock period increased from 100 ns to 120 ns.• The previous equation becomes:10-7Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationFour-Stage Instruction Pipeline10-8Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationPipeline Behavior• Pipeline behavior during a memory reference and during a branch.10-9Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationFilling the Load Delay Slot• SPARC code, (a) with a nop inserted, and (b) with srl migrated to nop position.10-10Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationCall-Return Behavior• Call-return behavior as a function of nesting depth and time (Adapted from Stallings, W., Computer Organization and Architecture: Designing for Performance, 4/e, Prentice Hall, Upper Saddle River, 1996).10-11Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationSPARC Registers• User view of RISC I registers.10-12Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationOverlapping Register Windows10-13Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationExample: Compiled C Program• Source code for C program to be compiled with gcc.10-14Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer Organizationgcc Generated SPARC Code10-15Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer Organizationgcc Generated SPARC Code (cont’)10-16Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationEffect ofCompilerOptimization• SPARC code generated with the -O optimization flag:10-17Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationThe PowerPC 601 Architecture10-18Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer Organization128-Bit IA-64 Instruction Word10-19Chapter 10 - Trends in Computer ArchitectureDepartment of Information Technology, Radford University ITEC 352 Computer OrganizationParallel Speedup and Amdahl’s Law• In the context of parallel processing, speedup can be computed:• Amdahl’s law, for p processors and a fraction f of unparallelizable code:•


View Full Document

Radford ITEC 352 - Study Notes

Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?