DOC PREVIEW
Berkeley COMPSCI 61C - Lecture Notes

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS61C L30 Pipelined Exec ution, part II (1) Garcia 2005 © U CBLecturer PSOE Dan Garciawww.cs.berkeley.edu/~ddgarciainst.eecs.berkeley.edu/~cs61cCS61C : Machine StructuresLecture 30 – Pipelined Execution, part IIwww.technologyreview.com/articles/05/04/wo/wo_040605krotoski.aspGames to learn?! ⇒Recent studies show thatthere may be a place for computergames in traditional K-12 classrooms.There is data that shows dropout ratesare lower, SATs, enjoyment, interest up!CS61C L30 Pipelined Exec ution, part II (2) Garcia 2005 © U CBReview: Pipeline (1/2)• Optimal Pipeline• Each stage is executing part of aninstruction each clock cycle.• One inst. finishes during each clock cycle.• On average, execute far more quickly.• What makes this work?• Similarities between instructions allow usto use same stages for all instructions(generally).• Each stage takes about the same amountof time as all others: little wasted time.CS61C L30 Pipelined Exec ution, part II (3) Garcia 2005 © U CBReview: Pipeline (2/2)• Pipelining is a BIG IDEA• widely used concept• What makes it less than perfect?• Structural hazards: suppose we hadonly one cache?⇒ Need more HW resources• Control hazards: need to worry aboutbranch instructions? ⇒ Delayed branch• Data hazards: an instruction dependson a previous instruction?CS61C L30 Pipelined Exec ution, part II (4) Garcia 2005 © U CBControl Hazard: Branching (1/7)Where do we do the compare for the branch? I$beqInstr 1Instr 2Instr 3Instr 4ALU I$Reg D$ RegALU I$Reg D$ RegALU I$Reg D$ RegALUReg D$ RegALU I$Reg D$ RegInstr.OrderTime (clock cycles)CS61C L30 Pipelined Exec ution, part II (5) Garcia 2005 © U CBControl Hazard: Branching (2/7)• We put branch decision-makinghardware in ALU stage• therefore two more instructions after thebranch will always be fetched, whetheror not the branch is taken• Desired functionality of a branch• if we do not take the branch, don’t wasteany time and continue executingnormally• if we take the branch, don’t execute anyinstructions after the branch, just go tothe desired labelCS61C L30 Pipelined Exec ution, part II (6) Garcia 2005 © U CBControl Hazard: Branching (3/7)• Initial Solution: Stall until decision ismade• insert “no-op” instructions: those thataccomplish nothing, just take time• Drawback: branches take 3 clock cycleseach (assuming comparator is put inALU stage)CS61C L30 Pipelined Exec ution, part II (7) Garcia 2005 © U CBControl Hazard: Branching (4/7)• Optimization #1:• move asynchronous comparator up toStage 2• as soon as instruction is decoded(Opcode identifies is as a branch),immediately make a decision and set thevalue of the PC (if necessary)• Benefit: since branch is complete inStage 2, only one unnecessaryinstruction is fetched, so only one no-opis needed• Side Note: This means that branches areidle in Stages 3, 4 and 5.CS61C L30 Pipelined Exec ution, part II (8) Garcia 2005 © U CB• Insert a single no-op (bubble)Control Hazard: Branching (5/7)addbeqlwALU I$Reg D$ RegALU I$Reg D$ RegALUReg D$ Reg I$Instr.OrderTime (clock cycles)bubble• Impact: 2 clock cycles per branchinstruction ⇒ slowCS61C L30 Pipelined Exec ution, part II (9) Garcia 2005 © U CBControl Hazard: Branching (6/7)• Optimization #2: Redefine branches• Old definition: if we take the branch,none of the instructions after the branchget executed by accident• New definition: whether or not we takethe branch, the single instructionimmediately following the branch getsexecuted (called the branch-delay slot)• The term “Delayed Branch” meanswe always execute inst after branchCS61C L30 Pipelined Exec ution, part II (10) Garcia 2005 © U CBControl Hazard: Branching (7/7)• Notes on Branch-Delay Slot• Worst-Case Scenario: can always put ano-op in the branch-delay slot• Better Case: can find an instructionpreceding the branch which can beplaced in the branch-delay slot withoutaffecting flow of the program- re-ordering instructions is a commonmethod of speeding up programs- compiler must be very smart in order tofind instructions to do this- usually can find such an instruction at least50% of the time- Jumps also have a delay slot…CS61C L30 Pipelined Exec ution, part II (11) Garcia 2005 © U CBExample: Nondelayed vs. Delayed Branchadd $1 ,$2,$3sub $4, $5,$6beq $1, $4, Exitor $8, $9 ,$10xor $10, $1,$11Nondelayed Branchadd $1 ,$2,$3sub $4, $5,$6beq $1, $4, Exitor $8, $9 ,$10xor $10, $1,$11Delayed BranchExit: Exit:CS61C L30 Pipelined Exec ution, part II (12) Garcia 2005 © U CBData Hazards (1/2)add $t0, $t1, $t2sub $t4, $t0 ,$t3and $t5, $t0 ,$t6or $t7, $t0 ,$t8xor $t9, $t0 ,$t10• Consider the following sequence ofinstructionsCS61C L30 Pipelined Exec ution, part II (13) Garcia 2005 © U CB Dependencies backwards in time are hazardsData Hazards (2/2)sub $t4,$t0,$t3ALUI$ Reg D$ Regand $t5,$t0,$t6ALUI$Reg D$ Regor $t7,$t0,$t8I$ALUReg D$ Regxor $t9,$t0,$t10ALUI$Reg D$ Regadd $t0,$t1,$t2IF ID/RF EX MEM WBALUI$Reg D$RegInstr.OrderTime (clock cycles)CS61C L30 Pipelined Exec ution, part II (14) Garcia 2005 © U CB• Forward result from one stage to anotherData Hazard Solution: Forwardingsub $t4,$t0,$t3ALUI$Reg D$ Regand $t5,$t0,$t6ALUI$Reg D$ Regor $t7,$t0,$t8I$ALUReg D$ Regxor $t9,$t0,$t10ALUI$Reg D$ Regadd $t0,$t1,$t2IF ID/RF EX MEM WBALUI$Reg D$Reg “or” hazard solved by register hardwareCS61C L30 Pipelined Exec ution, part II (15) Garcia 2005 © U CB• Dependencies backwards in time arehazardsData Hazard: Loads (1/4)sub $t3,$t0,$t2ALUI$Reg D$ Reglw $t0,0($t1)IF ID/RF EX MEM WBALUI$Reg D$Reg• Can’t solve with forwarding• Must stall instruction dependent onload, then forward (more hardware)CS61C L30 Pipelined Exec ution, part II (16) Garcia 2005 © U CB• Hardware must stall pipeline• Called “interlock”Data Hazard: Loads (2/4)sub $t3,$t0,$t2ALUI$Reg D$ Regbubbleand $t5,$t0,$t4ALUI$Reg D$ Regbubbleor $t7,$t0,$t6I$ALUReg D$bubblelw $t0, 0($t1)IF ID/RF EX MEM WBALUI$Reg D$RegCS61C L30 Pipelined Exec ution, part II (17) Garcia 2005 © U CBData Hazard: Loads (3/4)• Instruction slot after a load is called“load delay slot”• If that instruction uses the result of theload, then the hardware interlock willstall it for one cycle.• If the compiler puts an unrelatedinstruction in that slot, then no stall• Letting the hardware stall theinstruction in the delay slot isequivalent to putting a nop in the slot(except the latter


View Full Document

Berkeley COMPSCI 61C - Lecture Notes

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?