UMD CMSC 411 - Unit 3 – Instruction Pipelining - D2392186

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 411> Unit 3 – Instruction Pipelining

DOC PREVIEW

UMD CMSC 411 - Unit 3 – Instruction Pipelining

School name University of Maryland, College Park

Course Cmsc 411- Computer Systems Architecture

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CMSC 411 - A. Sussman (from D. O'Leary) 1Computer Systems ArchitectureCMSC 411Unit 3 – Instruction PipeliningAlan SussmanSeptember 21, 2004CMSC 411 - Alan Sussman 2So far• What we mean by computer performance• How to measure it• How instruction sets are designed• How the design influences performanceCMSC 411 - Alan Sussman 3What’s next• A variety of hardware and compiler techniques to speed the execution of programs– What is pipelining? (Section A.1)– How does MIPS divide instructions into stages or cycles? (A.1)– What kinds of overheads are there in pipelining? (A.1)– How much speedup do we get? (A.1)– What are structural hazards, data hazards, and control hazards? (A.2)– How are these techniques used to reduce stalls:• data forwarding? (A.2)• instruction reordering? (A.2)• compiler approaches to reduce branch delays? (A.2)CMSC 411 - Alan Sussman 4What is pipelining?• Pipelining is an implementation technique whereby multiple instructions are overlapped in execution• In other words, at any given moment in the execution of a computer program, many different instructions are at various stages of completion!• Example: Car washCMSC 411 - Alan Sussman 5Throughput• The number of instructions that completeper unit time• Instructions take many clock cyclesIdeally, every clock cycle, we want a new instruction to begin (and end)• This is how we will improve throughputCMSC 411 - Alan Sussman 6A MIPS implementation without pipelining• Recall from CMSC 311 that instructions execute in different stages or cycles– Instruction fetch cycle (IF): fetch the instruction from memory and update the program counter (PC) to point to the next instruction. Note: We’re not using the NPCregister that the book introduces.IR ← Mem[PC]PC ← PC + 4CMSC 411 - A. Sussman (from D. O'Leary) 2CMSC 411 - Alan Sussman 7MIPS w/o pipelining (cont.)– Instruction decode cycle (ID) : Put the operands in pipeline registers A and B. Sign-extend the low order 16 bits of the IR and store in pipeline register Imm. (This sometimes holds the "immediate" constant.)A ← Regs[IR6..10]B ← Regs[IR11..15]Imm ← ((IR16) 16##IR16..31)CMSC 411 - Alan Sussman 8MIPS w/o pipelining (cont.)– Execution cycle (EX) : Use the ALU– If memory reference:ALUOutput ← A + Imm– If register-register ALU instruction:ALUOutput ← A op B– If register-immediate ALU instruction:ALUOutput ← A op Imm– If branch instruction: compute the branch address and check the branch condition:ALUOutput ← PC + (Imm << 2)Cond ← (A op 0)(but PC or Imm should be adjusted down by 4 to make this work right).CMSC 411 - Alan Sussman 9MIPS w/o pipelining (cont.)– Memory access cycle (MEM) : finish loads, stores, and branches:Load: LMD ← Mem[ALUOutput]Store: Mem[ALUOutput] ← BBranch: if Cond then PC ← ALUOutputelse PC is okCMSC 411 - Alan Sussman 10MIPS w/o pipelining (cont.)– Write-back cycle (WB) : update the registersRegister-register ALU instruction: Regs[IR16..20] ← ALUOutputRegister-immediate ALU instruction: Regs[IR11..15] ← ALUOutputLoad instruction: Regs[IR11..15] ← LMDCMSC 411 - Alan Sussman 11Some notes• All instructions take 4-5 cycles• Some of the temporary registers could be eliminated, but they become convenient in a minute for pipelining• Could build the architecture so that these 5 cycles are one clock cycle, not 5• We assume that there are separate data paths for instruction memory and for data memory (so loads and stores don’t interfere with instruction fetch), usually implemented via separate cachesCMSC 411 - Alan Sussman 12A pipelined implementation of MIPSSuppose we have 5 load/store instructions to execute, all involving different registers and memory locations. Fig. A.1WBMEMEXIDIFi+4WBMEMEXIDIFi+3WBMEMEXIDIFi+2WBMEMEXIDIFi+1WBMEMEXIDIFi987654321Inst. #Clock cycleCMSC 411 - A. Sussman (from D. O'Leary) 3CMSC 411 - Alan Sussman 13With pipeline registersFigure A.3CMSC 411 - Alan Sussman 14Communication paths and timingFigure A.17CMSC 411 - Alan Sussman 15Ideal throughput from pipelining• Example 1: Suppose we have 100 load/store instructions to execute, and we arrange to have no register conflicts• Time for original MIPS implementation: 100 instructions × 5 cycles per instruction = 500 cycles• Time for pipelined MIPS implementation: 1st instruction takes 5 cycles. The others each finish 1 cycle later than the preceding one.Time = 5 + 99 = 104 cycles• Speedup = 500/104 ≈ 5• This is the “ideal case” - life is not that simpleCMSC 411 - Alan Sussman 16A more realistic case• Example 2: Suppose that in our original MIPS implementation, we could run the stages this fast:– IF - 10ns– ID - 8ns– EX - 7ns– MEM - 10ns– WB - 5ns• Then time for original MIPS implementation of 100 load/store instructions: – 100 instructions × 40ns per instruction = 4000 nsCMSC 411 - Alan Sussman 17Example 2 (cont.)• Time for pipelined MIPS implementation: We have to synchronize the stages, so we need to run the clock at 10 ns• 1st instruction takes 50 ns. The others each finish 1 cycle later than the preceding one.– Time = 50 ns + 99*10 ns = 1040 ns• Speedup = 4000/1040 ≈ 3.85CMSC 411 - Alan Sussman 18Even more realistic case• Example 3: The original MIPS implementation doesn't always need to use the MEM cycle– IF -10ns– ID - 8ns– EX - 7ns– MEM - 10ns– WB - 5ns• Suppose that only 30% of instructions use memory access. So, on average, for every 100 instructions, we have about 70 that use 4 stages and 30 that use 5.CMSC 411 - A. Sussman (from D. O'Leary) 4CMSC 411 - Alan Sussman 19Example 3 (cont.)• Time for original MIPS implementation:– 70 instructions × 30 ns per instruction +30 instructions × 40 ns per instruction =3300 ns• Time for pipelined MIPS implementation: We have to synchronize the stages, so we need to run the clock at 10 ns, and we need 5 cycles for everyinstruction.– 1st instruction takes 50 ns. The others each finish 1 cycle later than the preceding one– Time = 50 ns + 99*10 ns = 1040 ns• Speedup = 3300/1040 ≈ 3.17CMSC 411 - Alan Sussman 20Overhead of pipelining• We just summarized the two major overhead costs in pipelining:– making the time for every stage equal the time for the longest stage– making the time for every instruction equal the time for the longest instruction (not quite true, but true for a wide range of instructions)• Unfortunately, the speedup of pipelining is

View Full Document