Unformatted text preview:

Pipelining to SuperscalarSlide 2Limits of PipeliningProcessor PerformanceAmdahl’s LawRevisit Amdahl’s LawPipelined Performance ModelSlide 8Slide 9Motivation for Superscalar [Agerwala and Cocke]Superscalar ProposalLimits on Instruction Level Parallelism (ILP)Slide 13Classifying ILP MachinesSlide 15Slide 16Slide 17Slide 18Superscalar vs. SuperpipelinedSuperpipelining: Result LatencySuperscalar ChallengesPipelining to SuperscalarPipelining to SuperscalarProf. Mikko H. LipastUniversity of Wisconsin-MadisonLecture notes based on notes by John P. ShenUpdated by Mikko LipastPipelining to SuperscalarPipelining to SuperscalarForecast–Limits of pipelining–The case for superscalar–Instructon-level parallel machines–Superscalar pipeline organizaton–Superscalar pipeline designLimits of PipeliningLimits of PipeliningIBM RISC Experience–Control and data dependences add 15%–Best case CPI of 1.15, IPC of 0.87–Deeper pipelines (higher frequency) magnify dependence penaltesThis analysis assumes 100% cache hit rates–Hit rates approach 100% for some programs–Many important programs have much worse hit rates–Later!Processor PerformanceProcessor PerformanceIn the 1980’s (decade of pipelining):–CPI: 5.0 => 1.15In the 1990’s (decade of superscalar):–CPI: 1.15 => 0.5 (best case)In the 2000’s (decade of multcore):–Marginal CPI improvementProcessor Performance = ---------------Time ProgramInstructions Cycles ProgramInstructionTimeCycle (code size)=X X (CPI) (cycle time)Amdahl’s LawAmdahl’s Lawh = fracton of tme in serial codef = fracton that is vectorizablev = speedup for fOverall speedup:No. ofProcessorsNTime1h 1 - h1 - ffvffSpeedup11Revisit Amdahl’s LawRevisit Amdahl’s LawSequental bottleneckEven if v is infinite–Performance limited by nonvectorizable porton (1-f)fvffv1111limNo. ofProcessorsNTime1h 1 - h1 - ffPipelined Performance ModelPipelined Performance Modelg = fracton of tme pipeline is filled1-g = fracton of tme pipeline is not filled (stalled)1-ggPipelineDepthN1g = fracton of tme pipeline is filled1-g = fracton of tme pipeline is not filled (stalled)1-ggPipelineDepthN1Pipelined Performance ModelPipelined Performance ModelPipelined Performance ModelPipelined Performance ModelTyranny of Amdahl’s Law [Bob Colwell]–When g is even slightly below 100%, a big performance hit will result–Stalled cycles are the key adversary and must be minimized as much as possible1-ggPipelineDepthN1Motivation for SuperscalarMotivation for Superscalar[Agerwala and Cocke][Agerwala and Cocke]Typical RangeSpeedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s=1 (scalar)Superscalar ProposalSuperscalar ProposalModerate tyranny of Amdahl’s Law–Ease sequental bottleneck–More generally applicable–Robust (less sensitve to f)–Revised Amdahl’s Law: vfsfSpeedup11Limits on Instruction Level Limits on Instruction Level Parallelism (ILP)Parallelism (ILP)Weiss and Smith [1984] 1.58Sohi and Vajapeyam [1987] 1.81Tjaden and Flynn [1970] 1.86 (Flynn’s bottleneck)Tjaden and Flynn [1973] 1.96Uht [1986] 2.00Smith et al. [1989] 2.00Jouppi and Wall [1988] 2.40Johnson [1991] 2.50Acosta et al. [1986] 2.79Wedig [1982] 3.00Butler et al. [1991] 5.8Melvin and Patt [1991] 6Wall [1991] 7 (Jouppi disagreed)Kuck et al. [1972] 8Riseman and Foster [1972] 51 (no control dependences)Nicolau and Fisher [1984] 90 (Fisher’s optimism)Superscalar ProposalSuperscalar ProposalGo beyond single instructon pipeline, achieve IPC > 1Dispatch multple instructons per cycleProvide more generally applicable form of concurrency (not just vectors)Geared for sequental code that is hard to parallelize otherwiseExploit fine-grained or instructon-level parallelism (ILP)Classifying ILP MachinesClassifying ILP Machines[Jouppi, DECWRL 1991]Baseline scalar RISC–Issue parallelism = IP = 1–Operaton latency = OP = 1–Peak IPC = 1123456IF DE EX WB1 2 3 4 5 6 7 8 90TIME IN CYCLES (OF BASELINE MACHINE)SUCCESSIVEINSTRUCTIONSClassifying ILP MachinesClassifying ILP Machines[Jouppi, DECWRL 1991]Superpipelined: cycle tme = 1/m of baseline–Issue parallelism = IP = 1 inst / minor cycle–Operaton latency = OP = m minor cycles–Peak IPC = m instr / major cycle (m x speedup?)12345IF DEEXWB6123456Classifying ILP MachinesClassifying ILP Machines[Jouppi, DECWRL 1991]Superscalar:–Issue parallelism = IP = n inst / cycle–Operaton latency = OP = 1 cycle–Peak IPC = n instr / cycle (n x speedup?)IFDEEXWB123456978Classifying ILP MachinesClassifying ILP Machines[Jouppi, DECWRL 1991]VLIW: Very Long Instructon Word–Issue parallelism = IP = n inst / cycle–Operaton latency = OP = 1 cycle–Peak IPC = n instr / cycle = 1 VLIW / cycleIF DEEXWBClassifying ILP MachinesClassifying ILP Machines[Jouppi, DECWRL 1991]Superpipelined-Superscalar–Issue parallelism = IP = n inst / minor cycle–Operaton latency = OP = m minor cycles–Peak IPC = n x m instr / major cycleIFDEEXWB123456978Superscalar vs. SuperpipelinedSuperscalar vs. SuperpipelinedRoughly equivalent performance–If n = m then both have about the same IPC–Parallelism exposed in space vs. tmeTime in Cycles (of Base Machine)01 2 3 4 5 6 78 9SUPERPIPELINED10 11 12 13SUPERSCALARKey:IFetchDcodeExecuteWritebackSuperpipelining: Result LatencySuperpipelining: Result LatencySuperpipelining - Jouppi, 1989essentially describes a pipelined execution stageJouppií s base machineUnderpipelined machineSuperpipelined machineUnderpipelined machines cannot issue instructions as fast as they are executedNote - key characteristic of Superpipelined machines is that results are not available to M-1 successive instructionsSuperscalar ChallengesSuperscalar ChallengesI-cacheFETCHDECODECOMMITD-cacheBranchPredictorInstructionBufferStoreQueueReorderBufferI ntegerFloating-pointMediaMemo ryInstructionRegisterData


View Full Document

UW-Madison ECE/CS 752 - Pipelining to Superscalar

Download Pipelining to Superscalar
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelining to Superscalar and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining to Superscalar 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?