UCLA COMSCI M151B - Lecture8 - D3126697

Home> Schools> University of California, Los Angeles> Computer Science (COMSCI) > COMSCI M151B> Lecture8

DOC PREVIEW

UCLA COMSCI M151B - Lecture8

School name University of California, Los Angeles

Course Comsci M151b- Computer Systems Architecture

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Week 5 - Monday4_4Single Cycle Datapath PipeliningPerformance Issues Longest delay determines the clock period Critical path is the load instruction Instruction memory > Read operands from register file > Computing the effective address in the ALU > To accessing the datamemory > Writing to the register file Not feasible to vary period for different instructions We need to make the common case fast Violates design principle if not Pipelining will improve performance! Keep the clock period small Keep CPI as close to 1 as possiblePipelining Analogy Overlapping execution Parallelism improves performance Example: We have 4 tasks Washing Drying Folding Storing We have 4 loads Let's say each task takes 1/2 hour If we start at 6pm, and we don't work on each load in parallel It'll take us 2 hours on each load, and we'll work up to 2am However, if we use pipelining, while working on the next task, we can work on the previous task If we start at 6pm It'll take us 2 hours for the first load, and additional 1/2 hour each for the next three loads Overlap tasks that use different resourcesMIPS Pipeline Five stages, one step per stage 1. IF : Instruction fetch from memory Grabbing instruction from memory 2. ID : Instruction decode & register read 3. EX : Execute operation or calculate addressCalculate effective address in ALU 4. MEM: Access memory operand Load/Store memory 5. WB : Write result back to registerExecution in a Pipelined Datapath Basically similar to the analogy, but with task::stage Steady state is when all the resources are working on a stage4_5Pipeline Performance The goal of pipelining is to reduce the clock cycle time Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined data path with single-cycle data path Instruction: lw : 200+100+200+200+100 = 800 sw : 200+100+200+200+0 = 700 R : 200+100+200+0 +100 = 600 beq: 200+100+200+0 +0 = 500 Example: Well be paying in complexityPipeline Speedup We want to know how well the balanced is the pipelined stages Break stage into 2 stages, 95% and 5%, theres no speedup Speedup is affected by increased throughput Latency (time for each instruction) is not decreased If break even, we can get the full speedup by dividing thelatency by the number of stages i.e. all take the same time Time between instructions_pipelined = Time between instructions_nonpipelined / Number of stagesMixed Instructions in the Pipeline Add instruction doesn't need to go into data memory Load word will be executed in 5 cycles Add will be executed in 4 cycles Conflict at the last stage After load is done executing in data memory, it goes to registers After add is done executing in ALU, it goes to registersNeed regulation when sharing the pipeline Increase resourcesPipelining and ISA Design ISA that's relatively simple RISC much easier to pipeline than CISC MIPS is designed for pipelining All instructions are the same length Easier to fetch and decode in one cycle Few instruction formats Can decode and read registers in one step Load/store addressing Simpler because we can calculate the address early on (in 3rd stage) then access memory in 4th stage Alignment of memory operands Memory access takes only one cyclePipeline Principles Force any instructions that share a pipeline must have the same stages in the same order. Therefore, add does nothing during Mem stage sw does nothing during WB stage Will be a placeholder for the instruction so it doesn't conflict All intermediate values must be latched each cycle Ensure that instructions that are one after another don't interfere with each other's signals Pipelining impedes block reuse, need to have independent stages for block reuse Example: we need 2 adders and

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

UCLA COMSCI M151B - Lecture8

Sign up for free to view:

Please select your school