Unformatted text preview:

inst eecs berkeley edu cs61c Review Pipeline 1 2 CS61C Machine Structures Optimal Pipeline Lecture 30 Pipelined Execution part II Each stage is executing part of an instruction each clock cycle One inst finishes during each clock cycle On average execute far more quickly Lecturer PSOE Dan Garcia www cs berkeley edu ddgarcia Games to learn Recent studies show that there may be a place for computer games in traditional K 12 classrooms There is data that shows dropout rates are lower SATs enjoyment interest up What makes this work Similarities between instructions allow us to use same stages for all instructions generally Each stage takes about the same amount of time as all others little wasted time www technologyreview com articles 05 04 wo wo 040605krotoski asp CS61C L30 P ipelined Execution part II 1 Garcia 2005 U CB Review Pipeline 2 2 Time clock cycles I n I D Reg Reg beq s I D Reg Reg t Instr 1 r I D Reg Reg Instr 2 O I Reg D Reg Instr 3 r I D Reg Reg d Instr 4 e r Where do we do the compare for the branch ALU widely used concept ALU What makes it less than perfect ALU Structural hazards suppose we had only one cache Need more HW resources ALU ALU Control hazards need to worry about branch instructions Delayed branch Data hazards an instruction depends on a previous instruction Garcia 2005 U CB Control Hazard Branching 1 7 Pipelining is a BIG IDEA CS61C L30 P ipelined Execution part II 3 CS61C L30 P ipelined Execution part II 2 Garcia 2005 U CB CS61C L30 P ipelined Execution part II 4 Garcia 2005 U CB Control Hazard Branching 2 7 Control Hazard Branching 3 7 We put branch decision making hardware in ALU stage Initial Solution Stall until decision is made therefore two more instructions after the branch will always be fetched whether or not the branch is taken Desired functionality of a branch if we do not take the branch don t waste any time and continue executing normally if we take the branch don t execute any instructions after the branch just go to the desired label CS61C L30 P ipelined Execution part II 5 Garcia 2005 U CB insert no op instructions those that accomplish nothing just take time Drawback branches take 3 clock cycles each assuming comparator is put in ALU stage CS61C L30 P ipelined Execution part II 6 Garcia 2005 U CB Control Hazard Branching 4 7 Control Hazard Branching 5 7 Optimization 1 Garcia 2005 U CB O lw D Reg bub Reg I ble r d e Impact 2 clock cycles per branch r instruction slow ALU CS61C L30 P ipelined Execution part II 7 ALU Benefit since branch is complete in Stage 2 only one unnecessary instruction is fetched so only one no op is needed Side Note This means that branches are idle in Stages 3 4 and 5 I Insert a single no op bubble n Time clock cycles s I D Reg Reg add t r I D Reg Reg beq ALU move asynchronous comparator up to Stage 2 as soon as instruction is decoded Opcode identifies is as a branch immediately make a decision and set the value of the PC if necessary CS61C L30 P ipelined Execution part II 8 Control Hazard Branching 6 7 Control Hazard Branching 7 7 Optimization 2 Redefine branches Notes on Branch Delay Slot Old definition if we take the branch none of the instructions after the branch get executed by accident New definition whether or not we take the branch the single instruction immediately following the branch gets executed called the branch delay slot Worst Case Scenario can always put a no op in the branch delay slot Better Case can find an instruction preceding the branch which can be placed in the branch delay slot without affecting flow of the program re ordering instructions is a common method of speeding up programs compiler must be very smart in order to find instructions to do this usually can find such an instruction at least 50 of the time Jumps also have a delay slot The term Delayed Branch means we always execute inst after branch CS61C L30 P ipelined Execution part II 9 Garcia 2005 U CB Example Nondelayed vs Delayed Branch Nondelayed Branch or 8 9 10 Delayed Branch add 1 2 3 Garcia 2005 U CB CS61C L30 P ipelined Execution part II 10 Garcia 2005 U CB Data Hazards 1 2 Consider the following sequence of instructions add 1 2 3 sub 4 5 6 sub 4 5 6 beq 1 4 Exit beq 1 4 Exit or xor 10 1 11 xor 10 1 11 add t0 t1 t2 8 9 10 sub t4 t0 t3 and t5 t0 t6 or t7 t0 t8 xor t9 t0 t10 Exit CS61C L30 P ipelined Execution part II 11 Exit Garcia 2005 U CB CS61C L30 P ipelined Execution part II 12 Garcia 2005 U CB Data Hazards 2 2 Data Hazard Solution Forwarding Forward result from one stage to another D Reg I Reg ALU D Reg D Reg I Reg D Reg I Reg D Reg I Reg D or t7 t0 t8 xor t9 t0 t10 Reg Reg or hazard solved by register hardware CS61C L30 P ipelined Execution part II 13 Garcia 2005 U CB Data Hazard Loads 1 4 Dependencies backwards in time are hazards MEM lw t0 0 t1 WB Reg Reg D IF ID RF I Reg I sub t3 t0 t2 Reg CS61C L30 P ipelined Execution part II 15 Data Hazard Loads 3 4 Reg bub ble D bub ble Reg D bub ble I Reg Reg CS61C L30 P ipelined Execution part II 16 Reg D Garcia 2005 U CB Data Hazard Loads 4 4 Stall is equivalent to nop sub t3 t0 t2 If the compiler puts an unrelated instruction in that slot then no stall and t5 t0 t4 Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot except the latter uses more code space Garcia 2005 U CB or t7 t0 t6 CS61C L30 P ipelined Execution part II 18 Reg bub bub ble ble bub ble bub ble bub ble I Reg D Reg I Reg D Reg I Reg ALU nop D ALU If that instruction uses the result of the load then the hardware interlock will stall it for one cycle I ALU lw t0 0 t1 ALU Instruction slot after a load is called load delay slot CS61C L30 P ipelined Execution part II 17 WB Reg or t7 t0 t6 Garcia 2005 U CB MEM D I and t5 t0 t4 Can t solve with forwarding Must stall instruction dependent on load then forward more hardware EX ALU D Data Hazard Loads 2 4 Hardware must stall pipeline Called interlock ALU I sub t3 t0 t2 EX Garcia 2005 U CB ALU Reg CS61C L30 P ipelined Execution part II 14 ALU ID RF I ALU IF ALU lw …


View Full Document

Berkeley COMPSCI 61C - Lecture Notes

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?