UMD CMSC 411 - Unit 3 – Instruction Pipelining - D2392186

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 411> Unit 3 – Instruction Pipelining

DOC PREVIEW

UMD CMSC 411 - Unit 3 – Instruction Pipelining

School name University of Maryland, College Park

Course Cmsc 411- Computer Systems Architecture

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

So far Computer Systems Architecture CMSC 411 Unit 3 Instruction Pipelining Alan Sussman September 21 2004 What we mean by computer performance How to measure it How instruction sets are designed How the design influences performance CMSC 411 Alan Sussman What s next What is pipelining A variety of hardware and compiler techniques to speed the execution of programs What is pipelining Section A 1 How does MIPS divide instructions into stages or cycles A 1 What kinds of overheads are there in pipelining A 1 How much speedup do we get A 1 What are structural hazards data hazards and control hazards A 2 How are these techniques used to reduce stalls Pipelining is an implementation technique whereby multiple instructions are overlapped in execution In other words at any given moment in the execution of a computer program many different instructions are at various stages of completion Example Car wash data forwarding A 2 instruction reordering A 2 compiler approaches to reduce branch delays A 2 CMSC 411 Alan Sussman 3 CMSC 411 Alan Sussman 4 A MIPS implementation without pipelining Throughput Recall from CMSC 311 that instructions execute in different stages or cycles The number of instructions that complete per unit time Instructions take many clock cycles Ideally every clock cycle we want a new instruction to begin and end Instruction fetch cycle IF fetch the instruction from memory and update the program counter PC to point to the next instruction Note We re not using the NPC register that the book introduces IR Mem PC PC PC 4 This is how we will improve throughput CMSC 411 Alan Sussman 2 5 CMSC 411 A Sussman from D O Leary CMSC 411 Alan Sussman 6 1 MIPS w o pipelining cont MIPS w o pipelining cont Execution cycle EX Use the ALU If memory reference ALUOutput A Imm If register register ALU instruction ALUOutput A op B If register immediate ALU instruction ALUOutput A op Imm If branch instruction compute the branch address and check the branch condition ALUOutput PC Imm 2 Cond A op 0 but PC or Imm should be adjusted down by 4 to make this work right Instruction decode cycle ID Put the operands in pipeline registers A and B Signextend the low order 16 bits of the IR and store in pipeline register Imm This sometimes holds the immediate constant A Regs IR6 10 B Regs IR11 15 Imm IR16 16 IR16 31 CMSC 411 Alan Sussman 7 CMSC 411 Alan Sussman 8 MIPS w o pipelining cont MIPS w o pipelining cont Memory access cycle MEM finish loads stores and branches Load LMD Mem ALUOutput Store Mem ALUOutput B Branch if Cond then PC ALUOutput else PC is ok Write back cycle WB update the registers Register register ALU instruction Regs IR16 20 ALUOutput Register immediate ALU instruction Regs IR11 15 ALUOutput Load instruction Regs IR11 15 LMD CMSC 411 Alan Sussman 9 10 A pipelined implementation of MIPS Some notes Suppose we have 5 load store instructions to execute all involving different registers and memory locations Fig A 1 All instructions take 4 5 cycles Some of the temporary registers could be eliminated but they become convenient in a minute for pipelining Could build the architecture so that these 5 cycles are one clock cycle not 5 We assume that there are separate data paths for instruction memory and for data memory so loads and stores don t interfere with instruction fetch usually implemented via separate caches CMSC 411 Alan Sussman CMSC 411 Alan Sussman Inst 1 i IF i 1 i 2 i 3 i 4 11 CMSC 411 A Sussman from D O Leary 2 3 Clock cycle 4 5 ID EX MEM IF 6 7 8 9 WB ID EX MEM IF ID EX WB IF ID EX MEM IF ID EX MEM WB CMSC 411 Alan Sussman WB MEM WB 12 2 With pipeline registers Communication paths and timing Figure A 3 Figure A 17 CMSC 411 Alan Sussman 13 CMSC 411 Alan Sussman Ideal throughput from pipelining Example 1 Suppose we have 100 load store instructions to execute and we arrange to have no register conflicts Time for original MIPS implementation 100 instructions 5 cycles per instruction 500 cycles Time for pipelined MIPS implementation 1st instruction takes 5 cycles The others each finish 1 cycle later than the preceding one Time 5 99 104 cycles Speedup 500 104 5 This is the ideal case life is not that simple CMSC 411 Alan Sussman A more realistic case Example 2 Suppose that in our original MIPS implementation we could run the stages this fast IF 10ns ID 8ns EX 7ns MEM 10ns WB 5ns Then time for original MIPS implementation of 100 load store instructions 100 instructions 40ns per instruction 4000 ns 15 CMSC 411 Alan Sussman 16 Even more realistic case Example 2 cont Time for pipelined MIPS implementation We have to synchronize the stages so we need to run the clock at 10 ns 1st instruction takes 50 ns The others each finish 1 cycle later than the preceding one Example 3 The original MIPS implementation doesn t always need to use the MEM cycle IF 10ns ID 8ns EX 7ns MEM 10ns WB 5ns Suppose that only 30 of instructions use memory access So on average for every 100 instructions we have about 70 that use 4 stages and 30 that use 5 Time 50 ns 99 10 ns 1040 ns Speedup 4000 1040 3 85 CMSC 411 Alan Sussman 14 17 CMSC 411 A Sussman from D O Leary CMSC 411 Alan Sussman 18 3 Example 3 cont Overhead of pipelining Time for original MIPS implementation 70 instructions 30 ns per instruction 30 instructions 40 ns per instruction 3300 ns We just summarized the two major overhead costs in pipelining Time for pipelined MIPS implementation We have to synchronize the stages so we need to run the clock at 10 ns and we need 5 cycles for every instruction 1st instruction takes 50 ns The others each finish 1 cycle later than the preceding one Time 50 ns 99 10 ns 1040 ns Unfortunately the speedup of pipelining is reduced even further by hazards that cause bubbles in the pipeline Speedup 3300 1040 3 17 CMSC 411 Alan Sussman making the time for every stage equal the time for the longest stage making the time for every instruction equal the time for the longest instruction not quite true but true for a wide range of instructions 19 Pipeline hazards cause stalls CMSC 411 Alan Sussman 20 Pipeline hazards What causes delays in instruction completion When some instruction is unable to complete on schedule we must Structural hazards are hardware delays Example memory does not respond to a request as fast as it is expected to Data hazards arise when data can be predicted to be unready at the time it is needed Example an instruction needs a register that a previous instruction is still modifying Control hazards arise when we

View Full Document