14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 12 Introduction to Pipelined DatapathReview: Multicycle Data and Control PathReview: RTL SummaryReview: Multicycle Datapath FSMReview: FSM ImplementationSingle Cycle Disadvantages & AdvantagesMulticycle Advantages & DisadvantagesThe Five Stages of Load InstructionSingle Cycle vs. Multiple Cycle TimingPipelined MIPS ProcessorSingle Cycle, Multiple Cycle, vs. PipelinePipelining the MIPS ISAMIPS Pipeline Datapath ModificationsMIPS Pipeline Control Path ModificationsGraphically Representing MIPS PipelineWhy Pipeline? For Throughput!Can pipelining get us into trouble?A Unified Memory Would Be a Structural HazardHow About Register File Access?Register Usage Can Cause Data HazardsOne Way to “Fix” a Data HazardAnother Way to “Fix” a Data HazardLoads Can Cause Data HazardsStores Can Cause Data HazardsForwarding with Load-use Data HazardsBranch Instructions Cause Control HazardsOne Way to “Fix” a Control HazardOther Pipeline Structures Are PossibleSample Pipeline AlternativesSummaryPerformanceTwo notions of “performance”DefinitionsExampleBasis of EvaluationSPEC95Metrics of performanceAspects of CPU PerformanceCPIExample (RISC processor)Amdahl's LawSummary: Evaluating Instruction Sets?Spring 2006331 W12.114:332:331Computer Architecture and Assembly LanguageSpring 2006Week 12Introduction to Pipelined Datapath[Adapted from Dave Patterson’s UCB CS152 slides andMary Jane Irwin’s PSU CSE331 slides]Spring 2006331 W12.2Review: Multicycle Data and Control PathAddressRead Data(Instr. or Data)MemoryPCWrite DataRead Addr 1Read Addr 2Write AddrRegisterFileRead Data 1Read Data 2ALUWrite DataIRMDRABALUoutSignExtendShiftleft 2ALUcontrolShiftleft 2ALUOpControlFSMIRWriteMemtoRegMemWriteMemReadIorDPCWritePCWriteCondRegDstRegWriteALUSrcAALUSrcBzeroPCSource1111110000002234Instr[5-0]Instr[25-0]PC[31-28]Instr[15-0]Instr[31-26]3228Spring 2006331 W12.3Review: RTL SummaryStep R-type Mem Ref Branch JumpInstr fetchIR = Memory[PC]; PC = PC + 4;DecodeA = Reg[IR[25-21]];B = Reg[IR[20-16]];ALUOut = PC +(sign-extend(IR[15-0])<< 2);ExecuteALUOut = A op B;ALUOut = A + sign-extend (IR[15-0]);if (A==B) PC = ALUOut; PC = PC[31-28] ||(IR[25-0] << 2);Memory accessReg[IR[15-11]] = ALUOut;MDR = Memory[ALUOut]; orMemory[ALUOut] = B; Write-backReg[IR[20-16]] = MDR;Spring 2006331 W12.4Review: Multicycle Datapath FSMStartInstr FetchDecodeWrite BackMemory AccessExecute(Op = R-type)(Op = beq)(Op = lw or sw)(Op = j)(Op = lw)(Op = sw)0 123456789Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=XIorD=0MemRead;IRWriteALUSrcA=0ALUsrcB=01PCSource,ALUOp=00PCWriteALUSrcA=0ALUSrcB=11ALUOp=00PCWriteCond=0ALUSrcA=1ALUSrcB=10ALUOp=00PCWriteCond=0ALUSrcA=1ALUSrcB=00ALUOp=10PCWriteCond=0ALUSrcA=1ALUSrcB=00ALUOp=01PCSource=01PCWriteCondPCSource=10PCWriteMemReadIorD=1PCWriteCond=0MemWriteIorD=1PCWriteCond=0RegDst=1RegWriteMemtoReg=0PCWriteCond=0RegDst=0RegWriteMemtoReg=1PCWriteCond=0Spring 2006331 W12.5Review: FSM ImplementationCombinationalcontrol logicState RegInst[31-26]NextStateInputsOutputsOp0Op1Op2Op3Op4Op5PCWritePCWriteCondIorDMemReadMemWriteIRWriteMemtoRegPCSourceALUOpALUSourceBALUSourceARegWriteRegDstSystem ClockSpring 2006331 W12.6Single Cycle Disadvantages & AdvantagesUses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instructionIs wasteful of area since some functional units must (e.g., adders) be duplicated since they can not be shared during a clock cyclebutIs simple and easy to understandClkSingle Cycle Implementation:lw sw WasteCycle 1 Cycle 2Spring 2006331 W12.7Multicycle Advantages & DisadvantagesUses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction stepbalance the amount of work to be done in each steprestrict each step to use only one major functional unitMulticycle implementations allowfunctional units to be used more than once per instruction as long as they are used on different clock cyclesfaster clock ratesdifferent instructions to take a different number of clock cyclesbutRequires additional internal state registers, muxes, and more complicated (FSM) controlSpring 2006331 W12.8The Five Stages of Load InstructionIFetch: Instruction Fetch and Update PCDec: Registers Fetch and Instruction DecodeExec: Execute R-type; calculate memory addressMem: Read/write the data from/to the Data MemoryWB: Write the data back to the register fileCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5IFetch Dec Exec Mem WBlwSpring 2006331 W12.9Single Cycle vs. Multiple Cycle TimingClkCycle 1Multiple Cycle Implementation:IFetch Dec Exec Mem WBCycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10IFetch Dec Exec Memlw swClkSingle Cycle Implementation:lw sw WasteIFetchR-typeCycle 1 Cycle 2multicycle clock slower than 1/5th of single cycle clock due to stage flipflop overheadSpring 2006331 W12.10Pipelined MIPS ProcessorStart the next instruction while still working on the current oneimproves throughput - total amount of work done in a given timeinstruction latency (execution time, delay time, response time) is not reduced - time from the start of an instruction to its completionCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5IFetch Dec Exec Mem WBlwCycle 7Cycle 6 Cycle 8swIFetch Dec Exec Mem WBR-typeIFetch Dec Exec Mem WBSpring 2006331 W12.11Single Cycle, Multiple Cycle, vs. PipelineClkCycle 1Multiple Cycle Implementation:IFetch Dec Exec Mem WBCycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10lwIFetch Dec Exec Mem WBIFetch Dec Exec Memlw swPipeline Implementation:IFetch Dec Exec Mem WBswClkSingle Cycle Implementation:Load Store WasteIFetchR-typeIFetch Dec Exec Mem WBR-typeCycle 1 Cycle 2wasted cycleSpring 2006331 W12.12Pipelining the MIPS ISAWhat makes it easyall instructions are the same length (32 bits)few instruction formats (three) with symmetry across formatsmemory operations can occur only in loads and storesoperands must be aligned in memory so a single data transfer requires only one memory accessWhat makes it hardstructural hazards: what if we had only one memorycontrol hazards: what about branchesdata hazards: what if an instruction’s input operands depend on the output of a previous instructionSpring 2006331 W12.13MIPS Pipeline Datapath
View Full Document