DOC PREVIEW
NYU CSCI-GA 2243 - Lecture Notes

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1G22.2243-001High Performance Computer ArchitectureFall 20079/11/2007 2High Performance Computer Architecture – Spring 2006• Instructor: Dr. Mohammad BanikazemiAdjunct Assistant ProfessorResearch Staff Member atIBM T. J. Watson Research CenterEmail: [email protected]• Lecture: Wednesdays, 5:00pm-7:00pm402 CIWW• Office Hours: Wednesdays, 7:00pm-8:00pmand by appointmentRoom: 401 CIWWPhone: 8-3081•Mailing List: [email protected]/11/2007 3Outline• Motivation and Introduction• Course organization• Administrative stuff– workload and grading• Brief overview of SimpleScalar toolset• Fundamentals of computer design– Target markets– Technology trends– Cost vs. price– Measuring and reporting performance– Quantitative principles of computer design– Price performance[ Hennessy/Patterson CA:AQA (4thEdition): Chapter 1]9/11/2007 4Growth in Microprocessor PerformanceMicroprocessors emergence (late 70s):Improvements in IC technology35% improvement each yearReduced Instruction Set Ccomputer (early 80s):Instruction level parallelismUse of cachesCommercial success:Mass productionHigh level programmingStandardized OSTechnologyArchitecture and organizationCompiler9/11/2007 5From the Intel 386 to the Pentium 4Intel 386, introduced 1985275,000 transistors, 1 micron16 MHz clock speedIntel Pentium III, introduced 19999.5M transistors, 0.25 micron600 MHz clock speedIntel 4 Prescott, introduced late 04125M transistors, 0.09 micron2.8-3.8 GHz clock speed9/11/2007 6Intel Pentium III MicroarchitectureROBL2 cacheIDIFUDCUDTLBITLBBTBMISRSSIMDRATIEUFEUMIUMOBIFU: Instruction fetch unitID: Instruction dispatchMIS: Micro-instruction sequencerBTB: Branch target bufferRAT: Register alias tableRS: Reservation stationIEU: Int. execution unitFEU: FP execution unitDCU: Data cache unitROB: Reorder BufferMOB: Memory OrderingBufferMIU: Mem Interface unit29/11/2007 7Instruction Set ArchitectureComputer Architecture TopicsPipelining, Hazard Resolution,Superscalar, Reordering, ILP, Branch Prediction, SpeculationCache DesignBlock size, AssociativityL1 CacheL2 CacheDRAMDisks and TapeCoherence,Bandwidth,LatencyEmerging Technologies,InterleavingRAIDVLSIInput/Outputand StorageMemoryHierarchyProcessor DesignAddressing modes, formats9/11/2007 8Computer Architecture TopicsMInterconnection NetworkPMPMPMP°°°Topologies, Routing, Bandwidth, Latency, ReliabilityNetwork InterfacesShared Memory orMessage PassingNetworks, Interconnections, and Multiprocessors What is Ahead?• Today’s desktop microprocessors (e.g., Pentium 4 Extreme Edition)– 178 million transistors, 90 nanometer technology, 3.73 GHz clock speed– Internally: “hyper-pipelining”, multithreading, 128-bit SIMD instructions• The future (some are already here!)– Greater instruction level parallelism– Bigger caches, and more levels of cache– Multiple processors per chip (“multicore”)• Complete systems on a chip – Reducing the power consumption – High performance interconnects• Breakdown into desktop, enterprise, and embedded target markets– Different performance criteria• This course provides the background for you to design, analyze, and effectively use such systems9/11/2007 10Topic Coverage• Textbook– Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th Edition• Fundamentals of Computer Design 1 lecture• Instruction Set Architecture 0.5 lectures• Pipelining Basics 2 lectures• Instruction Level Parallelism 5 lectures• Memory Hierarchy 2 lectures• Multiprocessors 1.5 lectures• Interconnection Networks 2 lectures• Other topics such as multicore and power-aware architectures• Relevant computer architecture research papers 9/11/2007 11Course Workload•Lectures (Reading assignments from text)• Four programming assignments (50% of course grade)Build a simulator for a multiple-issue, out-of-order execution microprocessor in stages (Pentium III class) Use this simulator to understand impact of architectural techniques• Homework assignments (20% of course grade)• Final exam (30% of course grade)• Sample questions will be provided in class• Academic misconduct taken very seriouslyhttp://www.cs.nyu.edu/web/Academic/Graduate/academic_integrity.html9/11/2007 12SimpleScalar Toolset• Comprehensive collection of tools for evaluating new architectural techniques– Possible to define new instruction-set architectures (support for Alpha, PISA)– Modules for writing own execution-driven simulators• bpred, caches, statistics collection, program loading, functional unit construction, …– Many papers at recent architecture conferences use SimpleScalarSimulatorCore39/11/2007 13SimpleScalar Toolset (cont’d)• For the course assignments we will be using a small subset of the tools– You will be using an instructional ISA called PISA• Closely resembles MIPS 64 (described in the textbook)– PISA executables produced using GNU cross-compiler tools• Goal of the assignments: To understand the issues involved in implementing and to assess the potential benefits from architectural techniques used in modern-day microprocessors– E.g., Branch prediction in modern-day microprocessors• At what stage during instruction execution is branch prediction used?– It takes some time to figure out that an instruction is a branch• How should the branch predictor be updated with information about seen branches?• What impact does prediction have on performance?9/11/2007 14Background Survey• Programming in C/C++– Required for using the SimpleScalar toolset• Use of Unix systems– SimpleScalar installs available for SPARC/Solaris and x86/Linux• Prior coursework– Logic design: • Bits, gates, combinational and sequential logic• Adders, multipliers– Computer organization and assembly-level programming• ALUs, MIPS-like data path (without pipelining)• Register versus memory operations• Buses, I/OFundamentals of Computer Design9/11/2007 16Context for Designing New ArchitecturesWhat is the target market?• Desktop computing– General-purpose applications– Performance improvements must be traded off against cost– Performance metric of interest is usually response time• Enterprise servers– Cost is important but not as much of a concern– Performance is paramount, metric of interest is usually (not always) throughput• Embedded computers– Cost is paramount– Power concerns– Specialized applications9/11/2007 17Designing New


View Full Document

NYU CSCI-GA 2243 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?