COMP 206: Computer Architecture and ImplementationLecture OverviewPipelining: It’s Natural!Sequential LaundryPipelined Laundry: Start work ASAPPipelining LessonsThe Five Stages of a RISC InstructionKey Ideas Behind Instruction PipeliningPipelining the LOAD InstructionThe Four Stages of R-typePipelining the R-type and Load InstructionsImportant ObservationsSolution: Delay R-type’s Write by 1 CycleThe Four Stages of StoreThe Four Stages of BeqA Pipelined DatapathThe Instruction Fetch StageDetailed View of the Instruction Fetch UnitThe Decode / Register Fetch StageDetailed View of the Fetch/Decode StageLoad’s Address Calculation StageDetailed View of the Execution UnitLoad’s Memory Access StageLoad’s Write Back Stage1COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementationMontek SinghMontek SinghWed, Sep 14, 2005Wed, Sep 14, 2005Topic: Topic: Pipelining BasicsPipelining Basics2Reading: Appendix A (HP3)Lecture OverviewLecture OverviewPipelining Basics Pipelining Basics Introduction to the concept of pipelined processorIntroduction to the concept of pipelined processorPipelined DatapathPipelined DatapathPipeline example: Load InstructionPipeline example: Load Instruction3A B C DPipelining: It’s Natural!Pipelining: It’s Natural!Laundry Example:Laundry Example:Ann, Brian, Cathy, Dave Ann, Brian, Cathy, Dave each have one load of clothes each have one load of clothes to wash, dry, and foldto wash, dry, and foldWasher takes 30 minutesWasher takes 30 minutesDryer takes 40 minutesDryer takes 40 minutes““Folder” takes 20 minutesFolder” takes 20 minutes4Sequential laundry takes 6 hours for 4 Sequential laundry takes 6 hours for 4 loadsloadsIf they learned pipelining, how long would If they learned pipelining, how long would laundry take?laundry take? ABCD30 40 20 30 40 20 30 40 20 30 40 206 PM7 8 91011MidnightTaskOrderTimeSequential LaundrySequential Laundry5Pipelined laundry takes 3.5 hours for 4 loadsPipelined laundry takes 3.5 hours for 4 loads ABCD6 PM7 8 91011MidnightTaskOrderTime30 40 40 40 40 20Pipelined Laundry: Start work Pipelined Laundry: Start work ASAPASAP6ABCD6 PM7 8 9TaskOrderTime30 40 40 40 40 20Pipelining LessonsPipelining LessonsPipelining doesn’t help Pipelining doesn’t help latencylatency of single task, of single task, it helps it helps throughputthroughput of of entire workloadentire workloadPipeline rate limited by Pipeline rate limited by slowestslowest pipeline stage pipeline stageMultiple tasks Multiple tasks operating operating simultaneouslysimultaneouslyPotential speedup = Potential speedup = Number pipe stagesNumber pipe stagesUnbalanced lengths of Unbalanced lengths of pipe stages reduces pipe stages reduces speedupspeedupTime to “fill” pipeline Time to “fill” pipeline and time to “drain” it and time to “drain” it reduces speedupreduces speedup7IfetchIfetch: Instruction Fetch: Instruction FetchFetch the instruction from the Instruction MemoryFetch the instruction from the Instruction MemoryReg/DecReg/Dec: Registers Fetch and Instruction : Registers Fetch and Instruction DecodeDecodeExecExec: Calculate the memory address: Calculate the memory addressMemMem: Read the data from the Data Memory: Read the data from the Data MemoryWrBWrB: Write the data back to the register file: Write the data back to the register fileCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5Ifetch Reg/Dec Exec Mem WrBLoadThe Five Stages of a RISC The Five Stages of a RISC InstructionInstruction8The The loadload instruction has 5 stages: instruction has 5 stages: Five independent functional units to work on each stageFive independent functional units to work on each stageEach functional unit is used only once!Each functional unit is used only once!A second load can start doing Ifetch as soon as the first A second load can start doing Ifetch as soon as the first load finishes its Ifetch stageload finishes its Ifetch stageEach load still takes five cycles to completeEach load still takes five cycles to completeThe The latencylatency of a single load is still 5 cycles of a single load is still 5 cyclesThe throughput is much higherThe throughput is much higherCPI approaches 1 CPI approaches 1 Cycle time is ~1/5th the cycle time of the single-cycle Cycle time is ~1/5th the cycle time of the single-cycle implementationimplementationInstructions start executing before previous instructions Instructions start executing before previous instructions complete executioncomplete executionIfetch Reg/Dec Exec Mem WrBLoadKey Ideas Behind Instruction Key Ideas Behind Instruction PipeliningPipeliningCPI Cycle time 9Pipelining the LOAD InstructionPipelining the LOAD InstructionThe five independent pipeline stages are:The five independent pipeline stages are:Read next instruction: The Ifetch stageRead next instruction: The Ifetch stageDecode instruction and fetch register values: The Reg/Dec Decode instruction and fetch register values: The Reg/Dec stagestageExecute the operation: The Exec stageExecute the operation: The Exec stageAccess data memory: The Mem stageAccess data memory: The Mem stageWrite data to destination register: The WrB stageWrite data to destination register: The WrB stageOne instruction enters the pipeline every cycleOne instruction enters the pipeline every cycleOne instruction comes out of the pipeline (completed) every One instruction comes out of the pipeline (completed) every cyclecycleThe “effective” CPI is 7/3 (tends to 1); ~1/5 cycle timeThe “effective” CPI is 7/3 (tends to 1); ~1/5 cycle timeClockCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7Ifetch Reg/Dec Exec Mem WrB1st lwIfetch Reg/Dec Exec Mem WrB2nd lwIfetch Reg/Dec Exec Mem WrB3rd lw10IfetchIfetch: Instruction fetch: Instruction fetchFetch the instruction from the instruction memoryFetch the instruction from the instruction memoryReg/DecReg/Dec: Registers fetch and instruction : Registers fetch and instruction decodedecodeExecExec: ALU operates on the two register : ALU operates on the two register operandsoperandsWrBWrB: Write the ALU output back to the register : Write the ALU output back to the register filefileCycle 1 Cycle 2 Cycle 3 Cycle 4Ifetch Reg/Dec Exec WrBR-typeThe Four Stages of R-typeThe Four Stages of R-type11We have a problem called We have a problem
View Full Document