DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 16 Dynamic Scheduling: Scoreboards and Tomasulo

This preview shows page 1-2-3-4-5-38-39-40-41-42-43-76-77-78-79-80 out of 80 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 16 Dynamic Scheduling: Scoreboards and TomasuloThe Big Picture: Where are We Now?Recall: Compiler techniques for parallelismRecall: Can we somehow make CPI closer to 1?Recall: Revised FP Loop Minimizing StallsRecall: Unrolled Loop That Minimizes StallsRecall: Software Pipelining ExampleRecall: Software Pipelining with Loop Unrolling in VLIWRecall: Can we use HW to get CPI closer to 1?Scoreboard: a bookkeeping techniqueScoreboard Architecture(CDC 6600)Scoreboard ImplicationsFour Stages of Scoreboard ControlSlide 14Three Parts of the ScoreboardScoreboard ExampleDetailed Scoreboard Pipeline ControlScoreboard Example: Cycle 1Scoreboard Example: Cycle 2Scoreboard Example: Cycle 3Scoreboard Example: Cycle 4Scoreboard Example: Cycle 5Scoreboard Example: Cycle 6Scoreboard Example: Cycle 7Scoreboard Example: Cycle 8a (First half of clock cycle)Scoreboard Example: Cycle 8b (Second half of clock cycle)Scoreboard Example: Cycle 9Scoreboard Example: Cycle 10Scoreboard Example: Cycle 11Scoreboard Example: Cycle 12Scoreboard Example: Cycle 13Scoreboard Example: Cycle 14Scoreboard Example: Cycle 15Scoreboard Example: Cycle 16Scoreboard Example: Cycle 17Scoreboard Example: Cycle 18Scoreboard Example: Cycle 19Scoreboard Example: Cycle 20Scoreboard Example: Cycle 21Scoreboard Example: Cycle 22Faster than light computation (skip a couple of cycles)Scoreboard Example: Cycle 61Scoreboard Example: Cycle 62Review: Scoreboard Example: Cycle 62CDC 6600 ScoreboardHow are WAR and WAW hazards handled in Scoreboard?AdministriviaSlide 48Administrivia: Pentium-4 Architecture!Another Dynamic Algorithm: Tomasulo AlgorithmTomasulo Algorithm vs. ScoreboardTomasulo OrganizationReservation Station ComponentsThree Stages of Tomasulo AlgorithmTomasulo ExampleTomasulo Example Cycle 1Tomasulo Example Cycle 2Tomasulo Example Cycle 3Tomasulo Example Cycle 4Tomasulo Example Cycle 5Tomasulo Example Cycle 6Tomasulo Example Cycle 7Tomasulo Example Cycle 8Tomasulo Example Cycle 9Tomasulo Example Cycle 10Tomasulo Example Cycle 11Tomasulo Example Cycle 12Tomasulo Example Cycle 13Tomasulo Example Cycle 14Tomasulo Example Cycle 15Tomasulo Example Cycle 16Slide 72Tomasulo Example Cycle 55Tomasulo Example Cycle 56Tomasulo Example Cycle 57Compare to Scoreboard Cycle 62Tomasulo v. Scoreboard (IBM 360/91 v. CDC 6600)Tomasulo AnalysisSummary #1/2Summary #2/24/02/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec16.1CS152Computer Architecture and EngineeringLecture 16Dynamic Scheduling:Scoreboards and TomasuloApril 2, 2003John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://inst.eecs.berkeley.edu/~cs152/4/02/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec16.2°The Five Classic Components of a Computer°Today’s Topics: •Recap last lecture/Review Scoreboard•Administrivia•Tomasulo scheduling algorithm•Tomasulo loop unrollingThe Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput4/02/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec16.3Recall: Compiler techniques for parallelism°Loop unrolling  Multiple iterations of loop in software:•Amortizes loop overhead over several iterations•Gives more opportunity for scheduling around stalls°Software Pipelining  Take one instruction from each of several iterations of the loop•Software overlapping of loop iterations•Today will show hardware overlapping of loop iterations°Very Long Instruction Word machines (VLIW)  Multiple operations coded in single, long instruction•Requires sophisticated compiler to decide which operations can be done in parallel•Trace scheduling  find common path and schedule code as if branches didn’t exist (+ add “fixup code”)°All of these require additional registers4/02/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec16.4 Recall: Can we somehow make CPI closer to 1?Let’s assume full pipelining:If we have a 4-cycle instruction, then we need 3 instructions between a producing instruction and its use:multf $F0,$F2,$F4 multf $F0, $F2, $F4 ld $F0,0($r5)delay-1 delay-1 delay-1delay-2 delay-2 multf $F4,$F0,$F3delay-3 sw $F0, 4($R2)addf $F6,$F10,$F0Fetch Decode Ex1 Ex2 Ex3 Ex4 WBmultfdelay1delay2delay3addfEarliest forwarding for 4-cycle instructionsEarliest forwarding for1-cycle instructions4/02/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec16.5Recall: Revised FP Loop Minimizing Stalls6 clocks: CPI = 6/5 = 1.2)Instruction Execute Instruction Use Latencyproducing resultLatency using result in cyclesFP ALU op 4 Another FP ALU op 3FP ALU op 4 Store double 2Load double 2 FP ALU op 1 1 Loop: LD F0,0(R1) 2 stall 3 ADDD F4,F0,F2 4 SUBI R1,R1,8 5 BNEZ R1,Loop ;delayed branch 6 SD 8(R1),F4 ;altered when move past SUBISwap BNEZ and SD by changing address of SDUnroll loop 4 times code to make faster?4/02/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec16.6°What assumptions made when moved code?•OK to move store past SUBI even though changes register•OK to move loads before stores: get right data?•When is it safe for compiler to do such changes?1 Loop:LD F0,0(R1)2 LD F6,-8(R1)3 LD F10,-16(R1)4 LD F14,-24(R1)5 ADDD F4,F0,F26 ADDD F8,F6,F27 ADDD F12,F10,F28 ADDD F16,F14,F29 SD 0(R1),F410 SD -8(R1),F811 SD -16(R1),F1212 SUBI R1,R1,#3213 BNEZ R1,LOOP14 SD 8(R1),F16 ; 8-32 = -2414 clock cycles, or 3.5 per iterationCPI = 14/14 = 1When safe to move instructions?Recall: Unrolled Loop That Minimizes Stalls4/02/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec16.7Before: Unrolled 3 times 1 LD F0,0(R1) 2 ADDD F4,F0,F2 3 SD 0(R1),F4 4 LD F6,-8(R1) 5 ADDD F8,F6,F2 6 SD -8(R1),F8 7 LD F10,-16(R1) 8 ADDD F12,F10,F2 9 SD -16(R1),F12 10 SUBI R1,R1,#24 11 BNEZ R1,LOOPAfter: Software Pipelined 1 SD 0(R1),F4 ; Stores M[i] 2 ADDD F4,F0,F2 ; Adds to M[i-1] 3 LD F0,-16(R1); Loads M[i-2] 4 SUBI R1,R1,#8 5 BNEZ R1,LOOP• Symbolic Loop Unrolling– Maximize result-use distance – Less code space than unrolling– Fill & drain pipe only once per loop vs. once per each unrolled iteration in loop unrollingSW PipelineLoop Unrolledoverlapped opsTimeTimeRecall: Software Pipelining Example4/02/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec16.8Recall: Software Pipelining with Loop Unrolling in VLIWMemory Memory FP FP Int. op/ Clockreference 1 reference 2 operation 1 op. 2 branchLD F0,-48(R1) ST 0(R1),F4 ADDD F4,F0,F2 1LD F6,-56(R1) ST -8(R1),F8 ADDD F8,F6,F2 SUBI R1,R1,#24 2LD F10,-40(R1) ST 8(R1),F12 ADDD F12,F10,F2 BNEZ R1,LOOP 3°Software pipelined


View Full Document

Berkeley COMPSCI 152 - Lecture 16 Dynamic Scheduling: Scoreboards and Tomasulo

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 16 Dynamic Scheduling: Scoreboards and Tomasulo
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 16 Dynamic Scheduling: Scoreboards and Tomasulo and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 16 Dynamic Scheduling: Scoreboards and Tomasulo 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?