DOC PREVIEW
Berkeley COMPSCI 152 - Lecture Notes

This preview shows page 1-2-3-4-30-31-32-33-34-62-63-64-65 out of 65 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 16 Dynamic Scheduling, Speculation, and ILPReview: Compiler techniques for parallelismReview: Dynamic hardware for out-of-order executionThe Big Picture: Where are We Now?Three Stages of Tomasulo AlgorithmTomasulo OrganizationTomasulo Loop ExampleLoop ExampleLoop Example Cycle 1Loop Example Cycle 2Loop Example Cycle 3What does this mean physically?Loop Example Cycle 4Loop Example Cycle 5Loop Example Cycle 6Loop Example Cycle 7Loop Example Cycle 8Slide 18Loop Example Cycle 9Loop Example Cycle 10Loop Example Cycle 11Loop Example Cycle 12Loop Example Cycle 13Loop Example Cycle 14Loop Example Cycle 15Loop Example Cycle 16Loop Example Cycle 17Loop Example Cycle 18Loop Example Cycle 19Loop Example Cycle 20Why can Tomasulo overlap iterations of loops?Recall: Unrolled Loop That Minimizes StallsAdministriviaAdministrivia: Be careful about clock edges in lab5!Why issue in order?Branches must be resolved quickly for loop overlap!Independent “Fetch” unitPrediction: Branches, Dependencies, DataDynamic Branch PredictionSimple dynamic prediction: Branch Target Buffer (BTB)Slide 41Slide 42BHT AccuracyCorrelating BranchesSlide 45Accuracy of Different SchemesHW support for More ILPNow what about exceptions???Relationship between precise interrupts and specultation:HW support for precise interruptsFour Steps of Speculative Tomasulo AlgorithmTomasulo With Reorder buffer:Dynamic Scheduling in PowerPC 604 and Pentium ProSlide 54Dynamic Scheduling in Pentium ProLimits to Multi-Issue MachinesLimits to ILPSlide 58Upper Limit to ILP: Ideal MachineMore Realistic HW: Branch ImpactMore Realistic HW: Register Impact (rename regs)More Realistic HW: Alias ImpactRealistic HW for ‘9X: Window ImpactBraniac vs. Speed Demon(1993)Summary3/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.1CS152Computer Architecture and EngineeringLecture 16Dynamic Scheduling, Speculation, and ILPMarch 31, 1999John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/3/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.2Review: Compiler techniques for parallelism°Loop unrolling  Multiple iterations of loop in software:•Amortizes loop overhead over several iterations•Gives more opportunity for scheduling around stalls°Software Pipelining  Take one instruction from each of several iterations of the loop•Software overlapping of loop iterations•Today will show hardware overlapping of loop iterations°Very Long Instruction Word machines (VLIW)  Multiple operations coded in single, long instruction•Requires sophisticated compiler to decide which operations can be done in parallel•Trace scheduling  find common path and schedule code as if branches didn’t exist (+ add “fixup code”)°All of these require additional registers3/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.3Review: Dynamic hardware for out-of-order execution°HW exploitation of ILP•Works when can’t know dependence at compile time.•Code for one machine runs well on another°Scoreboard (ala CDC 6600 in 1963)•Centralized control structure•No register renaming, no forwarding•Pipeline stalls for WAR and WAW hazards.•Are these fundamental limitations??? (No)°Reservation stations (ala IBM 360/91 in 1966)•Distributed control structures•Implicit renaming of registers (dispatched pointers)•WAR and WAW hazards eliminated by register renaming•Results broadcast to all reservation stations for RAW•Reservation stations are like additional registers3/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.4°The Five Classic Components of a Computer°Today’s Topics: •Recap last lecture•Hardware loop unrolling with Tomasulo algorithm•Administrivia•Speculation, branch prediction•Reorder buffersThe Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput3/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.5Three Stages of Tomasulo Algorithm1. Issue—get instruction from FP Op Queue If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available°Normal data bus: data + destination (“go to” bus)°Common data bus: data + source (“come from” bus)•64 bits of data + 4 bits of Functional Unit source address•Write if matches expected Functional Unit (produces result)•Does the broadcast3/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.6Tomasulo OrganizationFP addersFP addersAdd1Add2Add3FP multipliersFP multipliersMult1Mult2From MemFP RegistersReservation StationsCommon Data Bus (CDB)To MemFP OpQueueLoad BuffersStore BuffersLoad1Load2Load3Load4Load5Load63/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.7Tomasulo Loop ExampleLoop: LD F0 0 R1MULTD F4 F0 F2SD F4 0 R1SUBI R1 R1 #8BNEZ R1 Loop°Assume Multiply takes 4 clocks°Assume first load takes 8 clocks (cache miss), second load takes 1 clock (hit)°To be clear, will show clocks for SUBI, BNEZ°Reality: integer instructions ahead3/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.8Loop ExampleInstruction status: ExecWriteITER Instruction j k IssueCompResultBusy AddrFu1 LD F0 0 R1 Load1 No1 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 NoReservation Stations:S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 LoopRegister result statusClockR1F0 F2 F4 F6 F8 F10 F12 ... F300 80Fu3/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.9Instruction status: ExecWriteITER Instruction j k IssueCompResultBusy AddrFu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 NoReservation Stations:S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 LoopRegister result statusClockR1F0 F2 F4 F6 F8 F10 F12 ... F301 80FuLoad1Loop Example Cycle 13/31/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec16.10Instruction status: ExecWriteITER Instruction j k IssueCompResultBusy AddrFu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2


View Full Document

Berkeley COMPSCI 152 - Lecture Notes

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?