DOC PREVIEW
UCLA COMSCI M151B - Lecture8

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Week 5 - Monday4_4Single Cycle Datapath PipeliningPerformance Issues Longest delay determines the clock period Critical path is the load instruction Instruction memory > Read operands from register file > Computing the effective address in the ALU > To accessing the datamemory > Writing to the register file Not feasible to vary period for different instructions We need to make the common case fast Violates design principle if not Pipelining will improve performance! Keep the clock period small Keep CPI as close to 1 as possiblePipelining Analogy Overlapping execution Parallelism improves performance Example: We have 4 tasks Washing Drying Folding Storing We have 4 loads Let's say each task takes 1/2 hour If we start at 6pm, and we don't work on each load in parallel It'll take us 2 hours on each load, and we'll work up to 2am However, if we use pipelining, while working on the next task, we can work on the previous task If we start at 6pm It'll take us 2 hours for the first load, and additional 1/2 hour each for the next three loads Overlap tasks that use different resourcesMIPS Pipeline Five stages, one step per stage 1. IF : Instruction fetch from memory Grabbing instruction from memory 2. ID : Instruction decode & register read 3. EX : Execute operation or calculate addressCalculate effective address in ALU 4. MEM: Access memory operand Load/Store memory 5. WB : Write result back to registerExecution in a Pipelined Datapath Basically similar to the analogy, but with task::stage Steady state is when all the resources are working on a stage4_5Pipeline Performance The goal of pipelining is to reduce the clock cycle time Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined data path with single-cycle data path Instruction: lw : 200+100+200+200+100 = 800 sw : 200+100+200+200+0 = 700 R : 200+100+200+0 +100 = 600 beq: 200+100+200+0 +0 = 500 Example: Well be paying in complexityPipeline Speedup We want to know how well the balanced is the pipelined stages Break stage into 2 stages, 95% and 5%, theres no speedup Speedup is affected by increased throughput Latency (time for each instruction) is not decreased If break even, we can get the full speedup by dividing thelatency by the number of stages i.e. all take the same time Time between instructions_pipelined = Time between instructions_nonpipelined / Number of stagesMixed Instructions in the Pipeline Add instruction doesn't need to go into data memory Load word will be executed in 5 cycles Add will be executed in 4 cycles Conflict at the last stage After load is done executing in data memory, it goes to registers After add is done executing in ALU, it goes to registersNeed regulation when sharing the pipeline Increase resourcesPipelining and ISA Design ISA that's relatively simple RISC much easier to pipeline than CISC MIPS is designed for pipelining All instructions are the same length Easier to fetch and decode in one cycle Few instruction formats Can decode and read registers in one step Load/store addressing Simpler because we can calculate the address early on (in 3rd stage) then access memory in 4th stage Alignment of memory operands Memory access takes only one cyclePipeline Principles Force any instructions that share a pipeline must have the same stages in the same order. Therefore, add does nothing during Mem stage sw does nothing during WB stage Will be a placeholder for the instruction so it doesn't conflict All intermediate values must be latched each cycle Ensure that instructions that are one after another don't interfere with each other's signals Pipelining impedes block reuse, need to have independent stages for block reuse Example: we need 2 adders and


View Full Document

UCLA COMSCI M151B - Lecture8

Download Lecture8
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture8 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture8 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?