Unformatted text preview:

inst eecs berkeley edu cs61c CS61C Machine Structures Lecture 31 Pipelined Execution part II 2004 11 10 Lecturer PSOE Dan Garcia www cs berkeley edu ddgarcia The Incredibles Election Data is now available Purple America www princeton edu rvdb JAVA election2004 www usatoday com news politicselections vote2004 countymap htm CS61C L31 Pipelined Execution part II 1 Garcia Fall 2004 UCB Review Pipeline 1 2 Optimal Pipeline Each stage is executing part of an instruction each clock cycle One inst finishes during each clock cycle On average execute far more quickly What makes this work Similarities between instructions allow us to use same stages for all instructions generally Each stage takes about the same amount of time as all others little wasted time CS61C L31 Pipelined Execution part II 2 Garcia Fall 2004 UCB Review Pipeline 2 2 Pipelining is a BIG IDEA widely used concept What makes it less than perfect Structural hazards suppose we had only one cache Need more HW resources Control hazards need to worry about branch instructions Delayed branch Data hazards an instruction depends on a previous instruction CS61C L31 Pipelined Execution part II 3 Garcia Fall 2004 UCB Control Hazard Branching 1 7 Time clock cycles ALU I n I D Reg Reg beq s I D Reg Reg t Instr 1 r I D Reg Reg Instr 2 O I D Reg Reg Instr 3 r I D Reg Reg d Instr 4 e r Where do we do the compare for the branch ALU ALU ALU ALU CS61C L31 Pipelined Execution part II 4 Garcia Fall 2004 UCB Control Hazard Branching 2 7 We put branch decision making hardware in ALU stage therefore two more instructions after the branch will always be fetched whether or not the branch is taken Desired functionality of a branch if we do not take the branch don t waste any time and continue executing normally if we take the branch don t execute any instructions after the branch just go to the desired label CS61C L31 Pipelined Execution part II 5 Garcia Fall 2004 UCB Control Hazard Branching 3 7 Initial Solution Stall until decision is made insert no op instructions those that accomplish nothing just take time Drawback branches take 3 clock cycles each assuming comparator is put in ALU stage CS61C L31 Pipelined Execution part II 6 Garcia Fall 2004 UCB Control Hazard Branching 4 7 Optimization 1 move asynchronous comparator up to Stage 2 as soon as instruction is decoded Opcode identifies is as a branch immediately make a decision and set the value of the PC if necessary Benefit since branch is complete in Stage 2 only one unnecessary instruction is fetched so only one no op is needed Side Note This means that branches are idle in Stages 3 4 and 5 CS61C L31 Pipelined Execution part II 7 Garcia Fall 2004 UCB Control Hazard Branching 5 7 Insert a single no op bubble I Time clock cycles beq Reg I D Reg Reg ALU add I ALU n s t r D Reg ALU O lw D Reg bub Reg I ble r d e Impact 2 clock cycles per branch r instruction slow CS61C L31 Pipelined Execution part II 8 Garcia Fall 2004 UCB Control Hazard Branching 6 7 Optimization 2 Redefine branches Old definition if we take the branch none of the instructions after the branch get executed by accident New definition whether or not we take the branch the single instruction immediately following the branch gets executed called the branch delay slot The term Delayed Branch means we always execute inst after branch CS61C L31 Pipelined Execution part II 9 Garcia Fall 2004 UCB Control Hazard Branching 7 7 Notes on Branch Delay Slot Worst Case Scenario can always put a no op in the branch delay slot Better Case can find an instruction preceding the branch which can be placed in the branch delay slot without affecting flow of the program re ordering instructions is a common method of speeding up programs compiler must be very smart in order to find instructions to do this usually can find such an instruction at least 50 of the time Jumps also have a delay slot CS61C L31 Pipelined Execution part II 10 Garcia Fall 2004 UCB Example Nondelayed vs Delayed Branch Nondelayed Branch or 8 9 10 Delayed Branch add 1 2 3 add 1 2 3 sub 4 5 6 sub 4 5 6 beq 1 4 Exit beq 1 4 Exit or xor 10 1 11 xor 10 1 11 Exit CS61C L31 Pipelined Execution part II 11 8 9 10 Exit Garcia Fall 2004 UCB Data Hazards 1 2 Consider the following sequence of instructions add t0 t1 t2 sub t4 t0 t3 and t5 t0 t6 or t7 t0 t8 xor t9 t0 t10 CS61C L31 Pipelined Execution part II 12 Garcia Fall 2004 UCB Data Hazards 2 2 ALU Dependencies backwards in time are hazards Time clock cycles I n IF ID RF EX MEM WB s add t0 t1 t2 I Reg Reg D t I D Reg Reg r sub t4 t0 t3 ALU I D Reg Reg D Reg I Reg ALU CS61C L31 Pipelined Execution part II 13 Reg ALU r or t7 t0 t8 d e xor t9 t0 t10 r I ALU O and t5 t0 t6 D Reg Garcia Fall 2004 UCB Data Hazard Solution Forwarding Forward result from one stage to another and t5 t0 t6 or t7 t0 t8 xor t9 t0 t10 Reg Reg D Reg I Reg D Reg I Reg D Reg I Reg ALU I sub t4 t0 t3 D ALU Reg WB ALU I EX MEM ALU ID RF ALU add t0 t1 t2 IF D Reg or hazard solved by register hardware CS61C L31 Pipelined Execution part II 14 Garcia Fall 2004 UCB Data Hazard Loads 1 4 Dependencies backwards in time are hazards I Reg I sub t3 t0 t2 EX MEM WB D Reg Reg ALU ID RF ALU lw t0 0 t1 IF D Reg Can t solve with forwarding Must stall instruction dependent on load then forward more hardware CS61C L31 Pipelined Execution part II 15 Garcia Fall 2004 UCB Data Hazard Loads 2 4 Hardware must stall pipeline Called interlock I sub t3 t0 t2 and t5 t0 t4 or t7 t0 t6 CS61C L31 Pipelined Execution part II 16 WB D Reg Reg bub ble I D bub ble Reg D bub ble I Reg ALU Reg MEM ALU I EX ALU ID RF ALU lw t0 0 t1 IF Reg Reg D Garcia Fall 2004 UCB Data Hazard Loads 3 4 Instruction slot after a load is called load delay slot If that instruction uses the result of the load then the hardware interlock will stall it for one cycle If the compiler puts an unrelated instruction in that slot then no stall Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot except the latter uses more code space CS61C L31 Pipelined Execution part II 17 Garcia Fall 2004 UCB Data Hazard …


View Full Document

Berkeley COMPSCI 61C - Lecture Notes

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?