Unformatted text preview:

COMP 206: Computer Architecture and ImplementationLecture OverviewPipelining: It’s Natural!Sequential LaundryPipelined Laundry: Start work ASAPPipelining LessonsThe Five Stages of a RISC InstructionKey Ideas Behind Instruction PipeliningPipelining the LOAD InstructionThe Four Stages of R-typePipelining the R-type and Load InstructionsImportant ObservationsSolution: Delay R-type’s Write by 1 CycleThe Four Stages of StoreThe Four Stages of BeqA Pipelined DatapathThe Instruction Fetch StageDetailed View of the Instruction Fetch UnitThe Decode / Register Fetch StageDetailed View of the Fetch/Decode StageLoad’s Address Calculation StageDetailed View of the Execution UnitLoad’s Memory Access StageLoad’s Write Back Stage1COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementationMontek SinghMontek SinghWed, Sep 14, 2005Wed, Sep 14, 2005Topic: Topic: Pipelining BasicsPipelining Basics2Reading: Appendix A (HP3)Lecture OverviewLecture OverviewPipelining Basics Pipelining Basics Introduction to the concept of pipelined processorIntroduction to the concept of pipelined processorPipelined DatapathPipelined DatapathPipeline example: Load InstructionPipeline example: Load Instruction3A B C DPipelining: It’s Natural!Pipelining: It’s Natural!Laundry Example:Laundry Example:Ann, Brian, Cathy, Dave Ann, Brian, Cathy, Dave each have one load of clothes each have one load of clothes to wash, dry, and foldto wash, dry, and foldWasher takes 30 minutesWasher takes 30 minutesDryer takes 40 minutesDryer takes 40 minutes““Folder” takes 20 minutesFolder” takes 20 minutes4Sequential laundry takes 6 hours for 4 Sequential laundry takes 6 hours for 4 loadsloadsIf they learned pipelining, how long would If they learned pipelining, how long would laundry take?laundry take? ABCD30 40 20 30 40 20 30 40 20 30 40 206 PM7 8 91011MidnightTaskOrderTimeSequential LaundrySequential Laundry5Pipelined laundry takes 3.5 hours for 4 loadsPipelined laundry takes 3.5 hours for 4 loads ABCD6 PM7 8 91011MidnightTaskOrderTime30 40 40 40 40 20Pipelined Laundry: Start work Pipelined Laundry: Start work ASAPASAP6ABCD6 PM7 8 9TaskOrderTime30 40 40 40 40 20Pipelining LessonsPipelining LessonsPipelining doesn’t help Pipelining doesn’t help latencylatency of single task, of single task, it helps it helps throughputthroughput of of entire workloadentire workloadPipeline rate limited by Pipeline rate limited by slowestslowest pipeline stage pipeline stageMultiple tasks Multiple tasks operating operating simultaneouslysimultaneouslyPotential speedup = Potential speedup = Number pipe stagesNumber pipe stagesUnbalanced lengths of Unbalanced lengths of pipe stages reduces pipe stages reduces speedupspeedupTime to “fill” pipeline Time to “fill” pipeline and time to “drain” it and time to “drain” it reduces speedupreduces speedup7IfetchIfetch: Instruction Fetch: Instruction FetchFetch the instruction from the Instruction MemoryFetch the instruction from the Instruction MemoryReg/DecReg/Dec: Registers Fetch and Instruction : Registers Fetch and Instruction DecodeDecodeExecExec: Calculate the memory address: Calculate the memory addressMemMem: Read the data from the Data Memory: Read the data from the Data MemoryWrBWrB: Write the data back to the register file: Write the data back to the register fileCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5Ifetch Reg/Dec Exec Mem WrBLoadThe Five Stages of a RISC The Five Stages of a RISC InstructionInstruction8The The loadload instruction has 5 stages: instruction has 5 stages: Five independent functional units to work on each stageFive independent functional units to work on each stageEach functional unit is used only once!Each functional unit is used only once!A second load can start doing Ifetch as soon as the first A second load can start doing Ifetch as soon as the first load finishes its Ifetch stageload finishes its Ifetch stageEach load still takes five cycles to completeEach load still takes five cycles to completeThe The latencylatency of a single load is still 5 cycles of a single load is still 5 cyclesThe throughput is much higherThe throughput is much higherCPI approaches 1 CPI approaches 1 Cycle time is ~1/5th the cycle time of the single-cycle Cycle time is ~1/5th the cycle time of the single-cycle implementationimplementationInstructions start executing before previous instructions Instructions start executing before previous instructions complete executioncomplete executionIfetch Reg/Dec Exec Mem WrBLoadKey Ideas Behind Instruction Key Ideas Behind Instruction PipeliningPipeliningCPI Cycle time 9Pipelining the LOAD InstructionPipelining the LOAD InstructionThe five independent pipeline stages are:The five independent pipeline stages are:Read next instruction: The Ifetch stageRead next instruction: The Ifetch stageDecode instruction and fetch register values: The Reg/Dec Decode instruction and fetch register values: The Reg/Dec stagestageExecute the operation: The Exec stageExecute the operation: The Exec stageAccess data memory: The Mem stageAccess data memory: The Mem stageWrite data to destination register: The WrB stageWrite data to destination register: The WrB stageOne instruction enters the pipeline every cycleOne instruction enters the pipeline every cycleOne instruction comes out of the pipeline (completed) every One instruction comes out of the pipeline (completed) every cyclecycleThe “effective” CPI is 7/3 (tends to 1); ~1/5 cycle timeThe “effective” CPI is 7/3 (tends to 1); ~1/5 cycle timeClockCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7Ifetch Reg/Dec Exec Mem WrB1st lwIfetch Reg/Dec Exec Mem WrB2nd lwIfetch Reg/Dec Exec Mem WrB3rd lw10IfetchIfetch: Instruction fetch: Instruction fetchFetch the instruction from the instruction memoryFetch the instruction from the instruction memoryReg/DecReg/Dec: Registers fetch and instruction : Registers fetch and instruction decodedecodeExecExec: ALU operates on the two register : ALU operates on the two register operandsoperandsWrBWrB: Write the ALU output back to the register : Write the ALU output back to the register filefileCycle 1 Cycle 2 Cycle 3 Cycle 4Ifetch Reg/Dec Exec WrBR-typeThe Four Stages of R-typeThe Four Stages of R-type11We have a problem called We have a problem


View Full Document

UNC-Chapel Hill COMP 206 - Pipelining Basics

Download Pipelining Basics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelining Basics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining Basics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?