New version page

NYU CSCI-GA 2243 - Lecture Notes

Upgrade to remove ads
Upgrade to remove ads
Unformatted text preview:

G22.2243-001High Performance Computer ArchitectureSpring 20061/19/2006 2High Performance Computer Architecture – Spring 2006• Instructor: Dr. Mohammad BanikazemiAdjunct Assistant ProfessorResearch Staff Member atIBM T. J. Watson Research CenterEmail: [email protected]• Lecture: Wednesdays, 5:00pm-7:00pm101 CIWW• Office Hours: Wednesdays, 7:00pm-8:00pmand by appointmentRoom: 401 CIWWPhone: 8-3081• Mailing List: [email protected]• Slides based on slides by Prof. Vijay Karamcheti1/19/2006 3Outline• Motivation and Introduction• Course organization• Administrative stuff– workload and grading• Brief overview of SimpleScalar toolset• Fundamentals of computer design– Target markets– Technology trends– Cost vs. price– Measuring and reporting performance– Quantitative principles of computer design– Price performance[ Hennessy/Patterson CA:AQA (3rdEdition): Chapter 1]1/19/2006 4Growth in Microprocessor PerformanceMicroprocessors emergence (late 70s):Improvements in IC technology35% improvement each yearReduced Instruction Set Ccomputer (early 80s):Instruction level parallelismUse of cachesCommercial success:Mass productionHigh level programmingStandardized OSTechnologyArchitecture and organizationCompiler1/19/2006 5From the Intel 386 to the Pentium 4Intel 386, introduced 1985275,000 transistors, 1 micron16 MHz clock speedIntel Pentium III, introduced 19999.5M transistors, 0.25 micron600 MHz clock speedIntel 4 Prescott, introduced late 04125M transistors, 0.09 micron2.8-3.8 GHz clock speed1/19/2006 6Intel Pentium III MicroarchitectureROBL2 cacheIDIFUDCUDTLBITLBBTBMISRSSIMDRATIEUFEUMIUMOBIFU: Instruction fetch unitID: Instruction dispatchMIS: Micro-instruction sequencerBTB: Branch target bufferRAT: Register alias tableRS: Reservation stationIEU: Int. execution unitFEU: FP execution unitDCU: Data cache unitROB: Reorder BufferMOB: Memory OrderingBufferMIU: Mem Interface unit1/19/2006 7Instruction Set ArchitectureComputer Architecture TopicsPipelining, Hazard Resolution,Superscalar, Reordering, ILP, Branch Prediction, SpeculationCache DesignBlock size, AssociativityL1 CacheL2 CacheDRAMDisks and TapeCoherence,Bandwidth,LatencyEmerging Technologies,InterleavingRAIDVLSIInput/Outputand StorageMemoryHierarchyProcessor DesignAddressing modes, formats1/19/2006 8Computer Architecture TopicsMInterconnection NetworkPMPMPMP°°°Topologies, Routing, Bandwidth, Latency, ReliabilityNetwork InterfacesShared Memory orMessage PassingNetworks, Interconnections, and MultiprocessorsWhat is Ahead?• Today’s desktop microprocessors (e.g., Pentium 4 Extreme Edition)– 178 million transistors, 90 nanometer technology, 3.73 GHz clock speed– Internally: “hyper-pipelining”, multithreading, 128-bit SIMD instructions• The future– Greater instruction level parallelism– Bigger caches, and more levels of cache– Multiple processors per chip (“multicore”)• Complete systems on a chip – Reducing the power consumption – High performance interconnects• Breakdown into desktop, enterprise, and embedded target markets– Different performance criteria• This course provides the background for you to design, analyze, and effectively use such systems1/19/2006 10Topic Coverage•Textbook– Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 3rd Edition, 2003• Fundamentals of Computer Design (Chapter 1) 1 lecture• Instruction Set Architecture (Chapter 2) 0.5 lectures• Pipelining Basics (Appendix A) 2 lectures• Instruction Level Parallelism (Chapters 3 and 4) 5 lectures• Memory Hierarchy (Chapter 5) 2 lectures• Multiprocessors (Chapter 6) 1.5 lectures• Interconnection Networks (Chapter 8) 2 lectures• Other topics such as multicore and power-aware architectures• Relevant computer architecture research papers1/19/2006 11Course Workload• Lectures (Reading assignments from text)• Four programming assignments (50% of course grade)Build a simulator for a multiple-issue, out-of-order execution microprocessor in stages (Pentium III class) Use this simulator to understand impact of architectural techniques• Homework assignments (20% of course grade)• Final exam (30% of course grade)• Sample questions will be provided in class• Academic misconduct taken very seriouslyhttp://www.cs.nyu.edu/web/Academic/Graduate/academic_integrity.html1/19/2006 12SimpleScalar Toolset• Comprehensive collection of tools for evaluating new architectural techniques– Possible to define new instruction-set architectures (support for Alpha, PISA)– Modules for writing own execution-driven simulators• bpred, caches, statistics collection, program loading, functional unit construction, …– Many papers at recent architecture conferences use SimpleScalarSimulatorCore1/19/2006 13SimpleScalar Toolset (cont’d)• For the course assignments we will be using a small subset of the tools– You will be using an instructional ISA called PISA• Closely resembles MIPS 64 (described in the textbook)– PISA executables produced using GNU cross-compiler tools• Goal of the assignments: To understand the issues involved in implementing and to assess the potential benefits from architectural techniques used in modern-day microprocessors– E.g., Branch prediction in modern-day microprocessors• At what stage during instruction execution is branch prediction used?– It takes some time to figure out that an instruction is a branch• How should the branch predictor be updated with information about seen branches?• What impact does prediction have on performance?1/19/2006 14Background Survey• Programming in C/C++– Required for using the SimpleScalar toolset• Use of Unix systems– SimpleScalar installs available for SPARC/Solaris and x86/Linux• Prior coursework– Logic design: • Bits, gates, combinational and sequential logic• Adders, multipliers– Computer organization and assembly-level programming• ALUs, MIPS-like data path (without pipelining)• Register versus memory operations•Buses, I/OFundamentals of Computer Design1/19/2006 16Context for Designing New ArchitecturesWhat is the target market?• Desktop computing– General-purpose applications– Performance improvements must be traded off against cost– Performance metric of interest is usually response time• Enterprise servers– Cost is important but not as much of a concern– Performance is paramount, metric of interest is usually (not always) throughput• Embedded


View Full Document
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?