4_44_54_64_7Chapter 4The ProcessorChapter 4 — The Processor — 2Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipeliningChapter 4 — The Processor — 3Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance§4.5 An Overview of Pipelining Four loads: Speedup= 8/3.5 = 2.3 Non-stop: Speedup= 2n/0.5n + 1.5 ≈ 4= number of stagesChapter 4 — The Processor — 4MIPS Pipeline Five stages, one step per stage1. IF: Instruction fetch from memory2. ID: Instruction decode & register read3. EX: Execute operation or calculate address4. MEM: Access memory operand5. WB: Write result back to registerChapter 4 — The Processor — 5MIPS Pipelined Datapath§4.6 Pipelined Datapath and ControlWBMEMRight-to-left flow leads to hazardsExecution in a Pipelined DatapathCC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9lwlwlwlwlwsteadystateIF ID EX MEM WBIM RegALUDM RegIF ID EX MEM WBIM RegALUDM RegIF ID EX MEM WBIM RegALUDM RegIF ID EX MEM WBIM RegALUDM RegIF ID EX MEM WBIM RegALUDM RegChapter 4The ProcessorChapter 4 — The Processor — 2Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapathInstr Instr fetch Register readALU op Memory accessRegister writeTotal timelw 200ps 100 ps 200ps 200ps 100 ps 800pssw 200ps 100 ps 200ps 200ps 700psR-format 200ps 100 ps 200ps 100 ps 600psbeq 200ps 100 ps 200ps 500psChapter 4 — The Processor — 3Pipeline PerformanceSingle-cycle (Tc= 800ps)Pipelined (Tc= 200ps)Chapter 4 — The Processor — 4Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructionspipelined= Time between instructionsnonpipelinedNumber of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decreaseMixed Instructions in the PipelineIM RegALURegIM RegALUDM RegCC1 CC2 CC3 CC4 CC5 CC6lwaddChapter 4 — The Processor — 6Pipelining and ISA Design MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3rdstage, access memory in 4thstage Alignment of memory operands Memory access takes only one cyclePipeline Principles All instructions that share a pipeline must have the same stages in the same order. therefore, add does nothing during Mem stage sw does nothing during WB stage All intermediate values must be latched each cycle. There is no functional block reuse example: we need 2 adders and ALU (like in single-cycle) IM RegALUDM RegIF ID EX MEM WBChapter 4 — The Processor — 8Pipeline registers Need registers between stages To hold information produced in previous cycleChapter 4The ProcessorChapter 4 — The Processor — 2Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. “multi-clock-cycle” diagram Graph of operation over time We’ll look at “single-clock-cycle” diagrams for load & storeChapter 4 — The Processor — 3IF for Load, Store, …Chapter 4 — The Processor — 4ID for Load, Store, …Chapter 4 — The Processor — 5EX for LoadChapter 4 — The Processor — 6MEM for LoadChapter 4 — The Processor — 7WB for LoadWrongregisternumberChapter 4 — The Processor — 8Corrected Datapath for LoadChapter 4 — The Processor — 9EX for StoreChapter 4 — The Processor — 10MEM for StoreChapter 4 — The Processor — 11WB for StoreChapter 4The ProcessorChapter 4 — The Processor — 2Multi-Cycle Pipeline Diagram Form showing resource usageChapter 4 — The Processor — 3Multi-Cycle Pipeline Diagram Traditional formChapter 4 — The Processor — 4Single-Cycle Pipeline Diagram State of pipeline in a given cycleChapter 4 — The Processor — 5Pipelined Control (Simplified)Chapter 4 — The Processor — 6Pipelined Control Control signals derived from instruction As in single-cycle implementationChapter 4 — The Processor — 7Pipelined ControlPipelined Control SignalsExecution Stage Control LinesMemory Stage Control LinesWrite Back Stage Control LinesInstruction RegDst ALUOp1ALUOp0ALUSrc Branch MemReadMemWriteRegWrite MemtoRegR-Format110 0 0 00 1 0lw000 1 0 10 1 1swx00 1 0 01 0 xbeqx01 0 1 00 0
View Full Document