CS152 – Computer Architecture and Engineering Fall 2004 Lecture 10: Basic MIPS Pipelining ReviewRecap last lectureThe Five Stages of Load InstructionPipelined MIPS ProcessorSingle Cycle, Multiple Cycle, vs. PipelineMultiple Cycle v. Pipeline, Bandwidth v. LatencyPipelining the MIPS ISAMIPS Pipeline Datapath ModificationsGraphically Representing MIPS PipelineWhy Pipeline? For Throughput!AdministriviaImportant ObservationSolution 1: Insert “Bubble” into the PipelineSolution 2: Delay R-type’s Write by One CycleCan Pipelining Get Us Into Trouble?A Single Memory Would Be a Structural HazardSlide 18Register Usage Can Cause Data HazardsLoads Can Cause Data HazardsOne Way to “Fix” a Data HazardSlide 24Forwarding with Load-use Data HazardsBranch Instructions Cause Control HazardsOne Way to “Fix” a Control HazardSlide 29MIPS Pipeline Control Path ModificationsControl SettingsOther Pipeline Structures Are PossibleSample Pipeline Alternatives (for ARM ISA)Peer InstructionSlide 36Slide 38Designing a Pipelined ProcessorBrain storm on bugs (if time permits)SummaryCS 152 L10 Pipeline Intro (1)Fall 2004 © UC RegentsCS152 – Computer Architecture andEngineeringFall 2004Lecture 10: Basic MIPS Pipelining ReviewJohn Lazzaro(www.cs.berkeley.edu/~lazzaro)Dave Patterson (www.cs.berkeley.edu/~patterson)[Adapted from Mary Jane Irwin’s slides www.cse.psu.edu/~cg431 ]CS 152 L10 Pipeline Intro (2)Fall 2004 © UC RegentsRecap last lectureCustomers: measure to buyArchitects: measure for designTools: Performance Equation, CPIEnergy: Amdahl’s Law’s lesson: Balance12C VddE0->1= 212C VddE1->0= 2SecondsProgram InstructionsProgram=Cycle InstructionCycles SecondsSpeedupwhole =11 - (% affected/Speeduppart)CS 152 L10 Pipeline Intro (3)Fall 2004 © UC RegentsThe Five Stages of Load InstructionIFetch: Instruction Fetch and Update PCDec: Registers Fetch and Instruction DecodeExec: Execute R-type; calculate memory addressMem: Read/write the data from/to the Data MemoryWB: Write the result data into the register fileCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5IFetch Dec Exec Mem WBlwCS 152 L10 Pipeline Intro (4)Fall 2004 © UC RegentsPipelined MIPS ProcessorStart the next instruction while still working on the current oneimproves throughput or bandwidth - total amount of work done in a given time (average instructions per second or per clock)instruction latency is not reduced (time from the start of an instruction to its completion)pipeline clock cycle (pipeline stage time) is limited by the slowest stagefor some instructions, some stages are wasted cyclesCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5IFetch Dec Exec Mem WBlwCycle 7Cycle 6 Cycle 8swIFetch Dec Exec Mem WBR-typeIFetch Dec Exec Mem WBCS 152 L10 Pipeline Intro (5)Fall 2004 © UC RegentsSingle Cycle, Multiple Cycle, vs. PipelineClkCycle 1Multiple Cycle Implementation:IFetch Dec Exec Mem WBCycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10lwIFetch Dec Exec Mem WBIFetch Dec Exec Memlw swPipeline Implementation:IFetch Dec Exec Mem WBswClkSingle Cycle Implementation:Load Store WasteIFetchR-typeIFetch Dec Exec Mem WBR-typeCycle 1 Cycle 2“wasted” cyclesCS 152 L10 Pipeline Intro (6)Fall 2004 © UC RegentsMultiple Cycle v. Pipeline, Bandwidth v. LatencyClkCycle 1Multiple Cycle Implementation:IFetch Dec Exec Mem WBCycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10lwIFetch Dec Exec Mem WBIFetch Dec Exec Memlw swPipeline Implementation:IFetch Dec Exec Mem WBswIFetchR-typeIFetch Dec Exec Mem WBR-type• Latency per lw = 5 clock cycles for both• Bandwidth of lw is 1 per clock clock (IPC) for pipeline vs. 1/5 IPC for multicycle• Pipelining improves instruction bandwidth, not instruction latencyCS 152 L10 Pipeline Intro (7)Fall 2004 © UC RegentsPipelining the MIPS ISAWhat makes it easyall instructions are the same length (32 bits)-easier to fetch in 1st stage and decode in 2nd stagefew instruction formats (three) with symmetry across formats-can begin reading register file in 2nd stagememory operations can occur only in loads and stores-can use the execute stage to calculate memory addresseseach MIPS instruction writes at most one result and does so near the end of the pipelineWhat makes it hardstructural hazards: what if we had only one memory?control hazards: what about branches?data hazards: what if an instruction’s input operands depend on the output of a previous instruction?CS 152 L10 Pipeline Intro (8)Fall 2004 © UC RegentsMIPS Pipeline Datapath ModificationsReadAddressInstructionMemoryAddPC401Write DataRead Addr 1Read Addr 2Write AddrRegisterFileRead Data 1Read Data 216 32ALU10Shiftleft 2AddDataMemoryAddressWrite DataReadData10What do we need to add/modify in our MIPS datapath?registers between pipeline stages to isolate themIFetch/DecDec/ExecExec/MemMem/WBIF:IFetch ID:Dec EX:Execute MEM:MemAccessWB:WriteBackSystem ClockSignExtendCS 152 L10 Pipeline Intro (9)Fall 2004 © UC RegentsGraphically Representing MIPS PipelineCan help with answering questions like:how many cycles does it take to execute this code?what is the ALU doing during cycle 4?is there a hazard, why does it occur, and how can it be fixed?ALUIMRegDM RegCS 152 L10 Pipeline Intro (10)Fall 2004 © UC RegentsWhy Pipeline? For Throughput!Instr.OrderTime (clock cycles)Inst 0Inst 1Inst 2Inst 4Inst 3ALUIMRegDM RegALUIMRegDM RegALUIMRegDM RegALUIMRegDM RegALUIMRegDM RegOnce the pipeline is full, one instruction is completed every cycleTime to fill the pipelineCS 152 L10 Pipeline Intro (11)Fall 2004 © UC RegentsAdministriviaLab 2 demo Friday, due MondayFeedback on team effortHow did it work? Change before pipeline?Reading Chapter 6, sections 6.1 to 6.4 for today, 6.5 to 6.9 for next 2 lecturesMidterm Tue Oct 12 5:30 - 8:30 in 101 Morgan (you asked for it) Northwest corner of campus, near Arch and HearstMidterm review Sunday Oct 10, 7 PM, 306 SodaBring 1 page, handwritten notes, both sidesNothing electronic: no calculators, cell phones, pagers, …Meet at LaVal’s Northside afterwards for PizzaCS 152 L10 Pipeline Intro (12)Fall 2004 © UC RegentsImportant ObservationEach functional unit can only be used once per instruction (since 4 other instructions executing)If each functional unit used at different stages then leads to hazards:Load uses Register File’s Write Port during its 5th stageR-type uses Register File’s Write Port
View Full Document