DOC PREVIEW
Berkeley COMPSCI 61C - Pipelined Execution, part I

This preview shows page 1 out of 4 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

inst eecs berkeley edu cs61c CS61C Machine Structures ALU 2004 11 10 Lecturer PSOE Dan Garcia ALU ALU www cs berkeley edu ddgarcia The Incredibles ALU Election Data is now available Purple America www princeton edu rvdb JAVA election2004 www usatoday com news politicselections vote2004 countymap htm CS61C L31 P ipelined Execution part II 1 Time clock cycles I n I D Reg Reg beq s I D Reg Reg t Instr 1 r I D Reg Reg Instr 2 O I Reg D Reg Instr 3 r I D Reg Reg d Instr 4 e r Where do we do the compare for the branch ALU Lecture 31 Pipelined Execution part II Control Hazard Branching 1 7 Garcia Fall 2004 U CB CS61C L31 P ipelined Execution part II 4 Garcia Fall 2004 U CB Control Hazard Branching 2 7 Control Hazard Branching 3 7 We put branch decision making hardware in ALU stage Initial Solution Stall until decision is made therefore two more instructions after the branch will always be fetched whether or not the branch is taken Desired functionality of a branch if we do not take the branch don t waste any time and continue executing normally if we take the branch don t execute any instructions after the branch just go to the desired label CS61C L31 P ipelined Execution part II 5 Garcia Fall 2004 U CB Control Hazard Branching 4 7 CS61C L31 P ipelined Execution part II 6 Garcia Fall 2004 U CB Control Hazard Branching 5 7 Optimization 1 O lw D Reg bub Reg I ble r d e Impact 2 clock cycles per branch r instruction slow ALU Garcia Fall 2004 U CB ALU Benefit since branch is complete in Stage 2 only one unnecessary instruction is fetched so only one no op is needed Side Note This means that branches are idle in Stages 3 4 and 5 I Insert a single no op bubble n Time clock cycles s I D Reg Reg add t r I D Reg Reg beq ALU move asynchronous comparator up to Stage 2 as soon as instruction is decoded Opcode identifies is as a branch immediately make a decision and set the value of the PC if necessary CS61C L31 P ipelined Execution part II 7 insert no op instructions those that accomplish nothing just take time Drawback branches take 3 clock cycles each assuming comparator is put in ALU stage CS61C L31 P ipelined Execution part II 8 Garcia Fall 2004 U CB Control Hazard Branching 6 7 Control Hazard Branching 7 7 Optimization 2 Redefine branches Notes on Branch Delay Slot Worst Case Scenario can always put a no op in the branch delay slot Better Case can find an instruction preceding the branch which can be placed in the branch delay slot without affecting flow of the program Old definition if we take the branch none of the instructions after the branch get executed by accident New definition whether or not we take the branch the single instruction immediately following the branch gets executed called the branch delay slot re ordering instructions is a common method of speeding up programs compiler must be very smart in order to find instructions to do this usually can find such an instruction at least 50 of the time Jumps also have a delay slot The term Delayed Branch means we always execute inst after branch CS61C L31 P ipelined Execution part II 9 Garcia Fall 2004 U CB Example Nondelayed vs Delayed Branch Nondelayed Branch or 8 9 10 Delayed Branch add 1 2 3 add 1 2 3 sub 4 5 6 sub 4 5 6 beq 1 4 Exit beq 1 4 Exit or xor 10 1 11 xor 10 1 11 CS61C L31 P ipelined Execution part II 10 Garcia Fall 2004 U CB Data Hazards 1 2 Consider the following sequence of instructions add t0 t1 t2 sub t4 t0 t3 8 9 10 and t5 t0 t6 or t7 t0 t8 xor t9 t0 t10 Exit Exit CS61C L31 P ipelined Execution part II 11 Garcia Fall 2004 U CB Data Hazards 2 2 Data Hazard Solution Forwarding Reg I Reg ALU D or t7 t0 t8 xor t9 t0 t10 Reg Reg D Reg I Reg D Reg I Reg D Reg I Reg ALU D and t5 t0 t6 WB D ALU Reg I EX MEM ALU Reg Reg ALU D ALU I ALU Reg ID RF I sub t4 t0 t3 ALU CS61C L31 P ipelined Execution part II 13 I IF ALU add t0 t1 t2 ALU r or t7 t0 t8 d e xor t9 t0 t10 r Garcia Fall 2004 U CB Forward result from one stage to another Dependencies backwards in time are hazards Time clock cycles I n IF ID RF EX MEM WB s add t0 t1 t2 I Reg Reg D t I Reg D Reg r sub t4 t0 t3 O and t5 t0 t6 CS61C L31 P ipelined Execution part II 12 D Reg Reg or hazard solved by register hardware Garcia Fall 2004 U CB CS61C L31 P ipelined Execution part II 14 Garcia Fall 2004 U CB Data Hazard Loads 1 4 Dependencies backwards in time are hazards lw t0 0 t1 WB Reg D IF ID RF I Reg I sub t3 t0 t2 Reg CS61C L31 P ipelined Execution part II 15 Garcia Fall 2004 U CB Data Hazard Loads 3 4 Reg D bub ble I Reg Reg D and t5 t0 t4 Reg bub bub ble ble bub ble bub ble bub ble I Reg D Reg I Reg D Reg I Reg ALU sub t3 t0 t2 D ALU I D Reg or t7 t0 t6 CS61C L31 P ipelined Execution part II 18 Garcia Fall 2004 U CB Administrivia First MIPS design did not interlock and stall on load use data hazard Real reason for name behind MIPS Microprocessor without Interlocked Pipeline Stages Word Play on acronym for Millions of Instructions Per Second also called MIPS CS61C L31 P ipelined Execution part II 19 bub ble Reg ALU Historical Trivia D Garcia Fall 2004 U CB ALU nop Garcia Fall 2004 U CB bub ble Data Hazard Loads 4 4 Stall is equivalent to nop If that instruction uses the result of the load then the hardware interlock will stall it for one cycle CS61C L31 P ipelined Execution part II 17 Reg CS61C L31 P ipelined Execution part II 16 lw t0 0 t1 Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot except the latter uses more code space WB Reg or t7 t0 t6 Instruction slot after a load is called load delay slot If the compiler puts an unrelated instruction in that slot then no stall MEM D I and t5 t0 t4 Can t solve with forwarding Must stall instruction dependent on load then forward more hardware EX ALU D Reg ALU sub t3 t0 t2 MEM ALU I EX ALU Reg ALU ID RF I ALU lw t0 0 t1 …


View Full Document

Berkeley COMPSCI 61C - Pipelined Execution, part I

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view Pipelined Execution, part I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelined Execution, part I and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?