DOC PREVIEW
UCLA COMSCI M151B - lec5-c4

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

4_44_54_64_7Chapter 4The ProcessorChapter 4 — The Processor — 2Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory  register file  ALU data memory  register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipeliningChapter 4 — The Processor — 3Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance§4.5 An Overview of Pipelining Four loads: Speedup= 8/3.5 = 2.3 Non-stop: Speedup= 2n/0.5n + 1.5 ≈ 4= number of stagesChapter 4 — The Processor — 4MIPS Pipeline Five stages, one step per stage1. IF: Instruction fetch from memory2. ID: Instruction decode & register read3. EX: Execute operation or calculate address4. MEM: Access memory operand5. WB: Write result back to registerChapter 4 — The Processor — 5MIPS Pipelined Datapath§4.6 Pipelined Datapath and ControlWBMEMRight-to-left flow leads to hazardsExecution in a Pipelined DatapathCC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9lwlwlwlwlwsteadystateIF ID EX MEM WBIM RegALUDM RegIF ID EX MEM WBIM RegALUDM RegIF ID EX MEM WBIM RegALUDM RegIF ID EX MEM WBIM RegALUDM RegIF ID EX MEM WBIM RegALUDM RegChapter 4The ProcessorChapter 4 — The Processor — 2Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapathInstr Instr fetch Register readALU op Memory accessRegister writeTotal timelw 200ps 100 ps 200ps 200ps 100 ps 800pssw 200ps 100 ps 200ps 200ps 700psR-format 200ps 100 ps 200ps 100 ps 600psbeq 200ps 100 ps 200ps 500psChapter 4 — The Processor — 3Pipeline PerformanceSingle-cycle (Tc= 800ps)Pipelined (Tc= 200ps)Chapter 4 — The Processor — 4Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructionspipelined= Time between instructionsnonpipelinedNumber of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decreaseMixed Instructions in the PipelineIM RegALURegIM RegALUDM RegCC1 CC2 CC3 CC4 CC5 CC6lwaddChapter 4 — The Processor — 6Pipelining and ISA Design MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3rdstage, access memory in 4thstage Alignment of memory operands Memory access takes only one cyclePipeline Principles All instructions that share a pipeline must have the same stages in the same order. therefore, add does nothing during Mem stage sw does nothing during WB stage All intermediate values must be latched each cycle. There is no functional block reuse example: we need 2 adders and ALU (like in single-cycle) IM RegALUDM RegIF ID EX MEM WBChapter 4 — The Processor — 8Pipeline registers Need registers between stages To hold information produced in previous cycleChapter 4The ProcessorChapter 4 — The Processor — 2Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. “multi-clock-cycle” diagram Graph of operation over time We’ll look at “single-clock-cycle” diagrams for load & storeChapter 4 — The Processor — 3IF for Load, Store, …Chapter 4 — The Processor — 4ID for Load, Store, …Chapter 4 — The Processor — 5EX for LoadChapter 4 — The Processor — 6MEM for LoadChapter 4 — The Processor — 7WB for LoadWrongregisternumberChapter 4 — The Processor — 8Corrected Datapath for LoadChapter 4 — The Processor — 9EX for StoreChapter 4 — The Processor — 10MEM for StoreChapter 4 — The Processor — 11WB for StoreChapter 4The ProcessorChapter 4 — The Processor — 2Multi-Cycle Pipeline Diagram Form showing resource usageChapter 4 — The Processor — 3Multi-Cycle Pipeline Diagram Traditional formChapter 4 — The Processor — 4Single-Cycle Pipeline Diagram State of pipeline in a given cycleChapter 4 — The Processor — 5Pipelined Control (Simplified)Chapter 4 — The Processor — 6Pipelined Control Control signals derived from instruction As in single-cycle implementationChapter 4 — The Processor — 7Pipelined ControlPipelined Control SignalsExecution Stage Control LinesMemory Stage Control LinesWrite Back Stage Control LinesInstruction RegDst ALUOp1ALUOp0ALUSrc Branch MemReadMemWriteRegWrite MemtoRegR-Format110 0 0 00 1 0lw000 1 0 10 1 1swx00 1 0 01 0 xbeqx01 0 1 00 0


View Full Document

UCLA COMSCI M151B - lec5-c4

Documents in this Course
lec10-c7

lec10-c7

32 pages

lec9-c5

lec9-c5

22 pages

lec8-c5

lec8-c5

47 pages

lec8-c4

lec8-c4

27 pages

lec7-c4

lec7-c4

33 pages

lec6-c4

lec6-c4

38 pages

lec4-c4

lec4-c4

33 pages

Load more
Download lec5-c4
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lec5-c4 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lec5-c4 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?