UMBC CMSC 611 - Benchmarks & Instruction Set Architecture

Unformatted text preview:

CMSC 611: AdvancedCMSC 611: AdvancedComputer ArchitectureComputer ArchitectureBenchmarks & Instruction SetBenchmarks & Instruction SetArchitectureArchitectureSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slidesSome material adapted from Hennessy & Patterson / © 2003 Elsevier Science610 time Executioncount nInstructio MIPS) (native MIPS!=The use of MIPS is simple and intuitive, faster machines have bigger MIPSUsing MIPSUsing MIPS• MIPS = Million of Instructions Per Second– one of the simplest metrics– valid only in a limited context• There are three problems with MIPS:– MIPS specifies the instruction execution rate butnot the capabilities of the instructions– MIPS varies between programs on the samecomputer– MIPS can vary inversely with performance (seenext example)Consider the machine with the following three instruction classes and CPI:Now suppose we measure the code for the same program from two differentcompilers and obtain the following data:Assume that the machine’s clock rate is 500 MHz. Which code sequencewill execute faster according to MIPS? According to execution time?Answer:Instruction class CPI for this instruction classA 1B 2C 3Instruction count in (billions) for eachinstruction classCode fromA B CCompiler 1 5 1 1Compiler 2 10 1 1iniiCCPI !="=1cycles clock CPUUsing the formula:Sequence 1: CPU clock cycles = (5 !1 + 1 ! 2 + 1 !3) ! 109 = 10!109 cyclesSequence 2: CPU clock cycles = (10 !1 + 1 !2 + 1 !3) ! 109 = 15!109 cyclesExampleExampleSequence 1: Execution time = (10!109)/(500!106) = 20 secondsSequence 2: Execution time = (15!109)/(500!106) = 30 secondsTherefore compiler 1 generates a faster programrate Clockcycles clock CPUtime Exection =Using the formula:610 time Executioncount nInstructio MIPS!=Using the formula:6910 2010 1) 1 (5 MIPS!!++=Sequence 1: = 3506910 3010 1) 1 (10 MIPS!!++=Sequence 2: = 400Although compiler 2 has a higher MIPS rating, the code from generated bycompiler 1 runs fasterExample (Cont.)Example (Cont.)referenceunratedreferenceMIPS time Executiontime Execution MIPS Relative !=Native, Peak and RelativeNative, Peak and RelativeMIPS, & FLOPSMIPS, & FLOPS• Peak MIPS is obtained by choosing aninstruction mix that maximizes the CPI,even if the mix is impractical• To make MIPS more practical amongdifferent instruction sets, a relative MIPSis introduced to compare machines toan agreed-upon reference machine (e.g.Vax 11/780)Native, Peak and RelativeNative, Peak and RelativeMIPS, & FLOPSMIPS, & FLOPS• With the fast development in the computertechnology, reference machine cannot beguaranteed to exist• Relative MIPS is practical for evolving designof the same computer• With the introduction of supercomputersaround speeding up floating pointcomputation, the term MFLOP is introducedanalogous to MIPSSynthetic BenchmarksSynthetic Benchmarks• Synthetic benchmarks are artificial programsthat are constructed to match thecharacteristics of large set of programs• Whetstone (scientific programs in Algol !Fortran) and Dhrystone (systems programs inAda ! C) are the most popular syntheticbenchmarks• Whetstone performance is measured in“Whetstone per second” – the number ofexecutions of one iteration of the whetstonebenchmarkSynthetic BenchmarkSynthetic BenchmarkDrawbacksDrawbacks1. They do not reflect the user interestsince they are not real applications2. They do not reflect real programbehavior (e.g. memory access pattern)3. Compiler and hardware can inflate theperformance of these programs farbeyond what the same optimizationcan achieve for real-programsDhrystone ExamplesDhrystone Examples• By assuming word alignment in stringcopy a 20-30% performanceimprovement could be achieved– Although 99.70-99.98% of typical stringcopies could NOT use such optimization• Compiler optimization could easilydiscard 25% of the Dhrystone code forsingle iteration loops and inlineprocedure expansionFinal Performance RemarksFinal Performance Remarks• Designing for performance only without consideringcost is unrealistic– In the supercomputing industry performance is the primaryand dominant goal– Low-end personal and embedded computers are extremelycost driven• Performance depends on three major factors– number of instructions,– cycles consumed by instruction execution– clock cycle• The art of computer design lies not in pluggingnumbers in a performance equation, but in accuratelydetermining how design alternatives will affectperformance and costIntroductionIntroduction• To command a computer's hardware, you must speak itslanguage• Instructions: the “words” of a machine's language• Instruction set: its “vocabulary• The MIPS instruction set is used as a case studyinstruction setsoftwarehardwareFigure: Dave PattersonInstruction Set ArchitectureInstruction Set Architecture• Once you learn one machine language, it is easy topick up others:– Common fundamental operations– All designer have the same goals: simplify building hardware,maximize performance, minimize cost• Goals:– Introduce design alternatives– Present a taxonomy of ISA alternatives• + some qualitative assessment of pros and cons– Present and analyze some instruction set measurements– Address the issue of languages and compilers and theirbearing on instruction set architecture– Show some example ISA’s• A good interface:– Lasts through many implementations (portability,compatibility)– Is used in many different ways (generality)– Provides convenient functionality to higher levels– Permits an efficient implementation at lower levels• Design decisions must take into account:– Technology– Machine organization– Programming languages– Compiler technology– Operating systemsInterfaceimp 1imp 2imp 3useuseuseTimeSlide: Dave PattersonInterface DesignInterface DesignMemory Memory ISAsISAs• Terms– Result = Operand <operation> Operand• Stack– Operate on top stack elements, push resultback on stack• Memory-Memory– Operands (and possibly also result) inmemoryRegisterRegister ISAs ISAs• Accumulator Architecture– Common in early stored-program computers when hardwarewas expensive– Machine has only one register (accumulator) involved in allmath & logic operations– Accumulator = Accumulator op Memory• Extended Accumulator Architecture (8086)– Dedicated registers for specific operations, e.g stack andarray index registers, added• General-Purpose Register


View Full Document

UMBC CMSC 611 - Benchmarks & Instruction Set Architecture

Download Benchmarks & Instruction Set Architecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Benchmarks & Instruction Set Architecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Benchmarks & Instruction Set Architecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?