DOC PREVIEW
Berkeley COMPSCI 61C - Pipelined Execution, part I

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS61C L31 Pipelined Exec ution, part II (1) Garcia, Fall 2004 © UCBLecturer PSOE Dan Garciawww.cs.berkeley.edu/~ddgarciainst.eecs.berkeley.edu/~cs61cCS61C : Machine StructuresLecture 31 – Pipelined Execution, part II2004-11-10The Incredibles!⇒Election Data isnow available…Purple America!www.princeton.edu/~rvdb/JAVA/election2004/www.usatoday.com/news/politicselections/vote2004/countymap.htmCS61C L31 Pipelined Exec ution, part II (4) Garcia, Fall 2004 © UCBControl Hazard: Branching (1/7)Where do we do the compare for the branch? I$beqInstr 1Instr 2Instr 3Instr 4ALU I$Reg D$ RegALU I$Reg D$ RegALU I$Reg D$ RegALUReg D$ RegALU I$Reg D$ RegInstr.OrderTime (clock cycles)CS61C L31 Pipelined Exec ution, part II (5) Garcia, Fall 2004 © UCBControl Hazard: Branching (2/7)• We put branch decision-makinghardware in ALU stage• therefore two more instructions after thebranch will always be fetched, whetheror not the branch is taken• Desired functionality of a branch• if we do not take the branch, don’t wasteany time and continue executingnormally• if we take the branch, don’t execute anyinstructions after the branch, just go tothe desired labelCS61C L31 Pipelined Exec ution, part II (6) Garcia, Fall 2004 © UCBControl Hazard: Branching (3/7)• Initial Solution: Stall until decision ismade• insert “no-op” instructions: those thataccomplish nothing, just take time• Drawback: branches take 3 clock cycleseach (assuming comparator is put inALU stage)CS61C L31 Pipelined Exec ution, part II (7) Garcia, Fall 2004 © UCBControl Hazard: Branching (4/7)• Optimization #1:• move asynchronous comparator up toStage 2• as soon as instruction is decoded(Opcode identifies is as a branch),immediately make a decision and set thevalue of the PC (if necessary)• Benefit: since branch is complete inStage 2, only one unnecessaryinstruction is fetched, so only one no-opis needed• Side Note: This means that branches areidle in Stages 3, 4 and 5.CS61C L31 Pipelined Exec ution, part II (8) Garcia, Fall 2004 © UCB• Insert a single no-op (bubble)Control Hazard: Branching (5/7)addbeqlwALU I$Reg D$ RegALU I$Reg D$ RegALUReg D$ Reg I$Instr.OrderTime (clock cycles)bubble• Impact: 2 clock cycles per branchinstruction ⇒ slowCS61C L31 Pipelined Exec ution, part II (9) Garcia, Fall 2004 © UCBControl Hazard: Branching (6/7)• Optimization #2: Redefine branches• Old definition: if we take the branch,none of the instructions after the branchget executed by accident• New definition: whether or not we takethe branch, the single instructionimmediately following the branch getsexecuted (called the branch-delay slot)• The term “Delayed Branch” meanswe always execute inst after branchCS61C L31 Pipelined Exec ution, part II (10) Garcia, Fall 2004 © UCBControl Hazard: Branching (7/7)• Notes on Branch-Delay Slot• Worst-Case Scenario: can always put ano-op in the branch-delay slot• Better Case: can find an instructionpreceding the branch which can beplaced in the branch-delay slot withoutaffecting flow of the program- re-ordering instructions is a commonmethod of speeding up programs- compiler must be very smart in order tofind instructions to do this- usually can find such an instruction at least50% of the time- Jumps also have a delay slot…CS61C L31 Pipelined Exec ution, part II (11) Garcia, Fall 2004 © UCBExample: Nondelayed vs. Delayed Branchadd $1 ,$2,$3sub $4, $5,$6beq $1, $4, Exitor $8, $9 ,$10xor $10, $1,$11Nondelayed Branchadd $1 ,$2,$3sub $4, $5,$6beq $1, $4, Exitor $8, $9 ,$10xor $10, $1,$11Delayed BranchExit: Exit:CS61C L31 Pipelined Exec ution, part II (12) Garcia, Fall 2004 © UCBData Hazards (1/2)add $t0, $t1, $t2sub $t4, $t0 ,$t3and $t5, $t0 ,$t6or $t7, $t0 ,$t8xor $t9, $t0 ,$t10• Consider the following sequence ofinstructionsCS61C L31 Pipelined Exec ution, part II (13) Garcia, Fall 2004 © UCB Dependencies backwards in time are hazardsData Hazards (2/2)sub $t4,$t0,$t3ALUI$ Reg D$ Regand $t5,$t0,$t6ALUI$Reg D$ Regor $t7,$t0,$t8I$ALUReg D$ Regxor $t9,$t0,$t10ALUI$Reg D$ Regadd $t0,$t1,$t2IF ID/RF EX MEM WBALUI$Reg D$RegInstr.OrderTime (clock cycles)CS61C L31 Pipelined Exec ution, part II (14) Garcia, Fall 2004 © UCB• Forward result from one stage to anotherData Hazard Solution: Forwardingsub $t4,$t0,$t3ALUI$Reg D$ Regand $t5,$t0,$t6ALUI$Reg D$ Regor $t7,$t0,$t8I$ALUReg D$ Regxor $t9,$t0,$t10ALUI$Reg D$ Regadd $t0,$t1,$t2IF ID/RF EX MEM WBALUI$Reg D$Reg “or” hazard solved by register hardwareCS61C L31 Pipelined Exec ution, part II (15) Garcia, Fall 2004 © UCB• Dependencies backwards in time arehazardsData Hazard: Loads (1/4)sub $t3,$t0,$t2ALUI$Reg D$ Reglw $t0,0($t1)IF ID/RF EX MEM WBALUI$Reg D$Reg• Can’t solve with forwarding• Must stall instruction dependent onload, then forward (more hardware)CS61C L31 Pipelined Exec ution, part II (16) Garcia, Fall 2004 © UCB• Hardware must stall pipeline• Called “interlock”Data Hazard: Loads (2/4)sub $t3,$t0,$t2ALUI$Reg D$ Regbubbleand $t5,$t0,$t4ALUI$Reg D$ Regbubbleor $t7,$t0,$t6I$ALUReg D$bubblelw $t0, 0($t1)IF ID/RF EX MEM WBALUI$Reg D$RegCS61C L31 Pipelined Exec ution, part II (17) Garcia, Fall 2004 © UCBData Hazard: Loads (3/4)• Instruction slot after a load is called“load delay slot”• If that instruction uses the result of theload, then the hardware interlock willstall it for one cycle.• If the compiler puts an unrelatedinstruction in that slot, then no stall• Letting the hardware stall theinstruction in the delay slot isequivalent to putting a nop in the slot(except the latter uses more code space)CS61C L31 Pipelined Exec ution, part II (18) Garcia, Fall 2004 © UCBData Hazard: Loads (4/4)• Stall is equivalent to nopsub $t3,$t0,$t2and $t5,$t0,$t4or $t7,$t0,$t6I$ALUReg D$lw $t0, 0($t1)ALUI$Reg D$RegbubblebubblebubblebubblebubbleALUI$Reg D$RegALUI$Reg D$RegnopCS61C L31 Pipelined Exec ution, part II (19) Garcia, Fall 2004 © UCB Historical Trivia• First MIPS design did not interlock andstall on load-use data hazard• Real reason for name behind MIPS:Microprocessor withoutInterlockedPipelineStages• Word Play on acronym forMillions of Instructions Per Second,also called MIPSCS61C L31 Pipelined Exec ution, part II (20) Garcia, Fall 2004 © UCBAdministrivia• No lab this week (wed, thu or fri)• Due to Veterans Day holiday on Thursday.• The lab is posted as a take-home lab;show TA your results in the following lab.•


View Full Document

Berkeley COMPSCI 61C - Pipelined Execution, part I

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Download Pipelined Execution, part I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelined Execution, part I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelined Execution, part I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?