DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 18

This preview shows page 1-2-3-4-5-34-35-36-37-38-69-70-71-72-73 out of 73 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 18 Dynamic Scheduling (Cont), Speculation, and ILPThe Big Picture: Where are We Now?Review: Scoreboard Architecture(CDC 6600)Review: Four Stages of Scoreboard ControlReview: Tomasulo OrganizationRecall: Reservation Station ComponentsRecall: Three Stages of Tomasulo AlgorithmRecall: Comparison of two techniquesTomasulo Loop ExampleLoop ExampleLoop Example Cycle 1Loop Example Cycle 2Loop Example Cycle 3What does this mean physically?Loop Example Cycle 4Loop Example Cycle 5Loop Example Cycle 6Loop Example Cycle 7Loop Example Cycle 8Slide 20Loop Example Cycle 9Loop Example Cycle 10Loop Example Cycle 11Loop Example Cycle 12Loop Example Cycle 13Loop Example Cycle 14Loop Example Cycle 15Loop Example Cycle 16Loop Example Cycle 17Loop Example Cycle 18Loop Example Cycle 19Loop Example Cycle 20Why can Tomasulo overlap iterations of loops?Recall: Unrolled Loop That Minimizes StallsAdministriviaWhy issue in-order?Now what about exceptions???HW support for precise interruptsFour Steps of Speculative Tomasulo AlgorithmTomasulo With Reorder buffer:Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Memory Disambiguation: Handling RAW Hazards in memoryHardware Support for Memory DisambiguationMemory Disambiguation:What about FETCH? Independent “Fetch” unitBranches must be resolved quickly for loop overlap!Prediction: Branches, Dependencies, DataDynamic Branch PredictionSimple dynamic prediction: Branch Target Buffer (BTB)Slide 56Slide 57BHT AccuracyCorrelating BranchesSlide 60Accuracy of Different SchemesHW support for More ILPLimits to Multi-Issue MachinesLimits to ILPSlide 65Upper Limit to ILP: Ideal MachineMore Realistic HW: Branch ImpactMore Realistic HW: Register Impact (rename regs)More Realistic HW: Alias ImpactRealistic HW for ‘9X: Window ImpactBraniac vs. Speed Demon(1993)Summary #1/2Summary #2/211/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.1CS152Computer Architecture and EngineeringLecture 18Dynamic Scheduling (Cont), Speculation, and ILPNovember 2, 2001John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/11/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.2°The Five Classic Components of a Computer°Today’s Topics: •Recap last lecture•Hardware loop unrolling with Tomasulo algorithm•Administrivia•Speculation, branch prediction•Reorder buffersThe Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput11/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.3Review: Scoreboard Architecture(CDC 6600)Functional UnitsRegistersFP MultFP MultFP MultFP MultFP DivideFP DivideFP AddFP AddIntegerIntegerMemorySCOREBOARDSCOREBOARD11/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.4Review: Four Stages of Scoreboard Control°Issue—decode instructions & check for structural hazards •Instructions issued in program order (for hazard checking)•Don’t issue if structural hazard•Don’t issue if instruction is output dependent on any previously issued but uncompleted instruction (no WAW hazards) °Read operands—wait until no data hazards, then read operands •All real dependencies (RAW hazards) resolved in this stage•No forwarding of data in this model!°Execution—operate on operands (EX)•The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution. °Write result—finish execution (WB)•Stall until no WAR hazards with previous instructions:Example: DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14CDC 6600 scoreboard would stall SUBD until ADDD reads operands11/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.5Review: Tomasulo OrganizationFP addersFP addersAdd1Add2Add3FP multipliersFP multipliersMult1Mult2From MemFP RegistersReservation StationsCommon Data Bus (CDB)To MemFP OpQueueLoad BuffersStore BuffersLoad1Load2Load3Load4Load5Load611/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.6Recall: Reservation Station ComponentsOp: Operation to perform in the unit (e.g., + or –)Vj, Vk: Value of Source operands•Store buffers has V field, result to be storedQj, Qk: Reservation stations producing source registers (value to be written)•Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready•Store buffers only have Qi for RS producing result Busy: Indicates reservation station or FU is busyRegister result status (Or “Rename Table”)—•Mapping from user-visible registers to reservation stations or value11/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.7Recall: Three Stages of Tomasulo Algorithm1. Issue—get instruction from FP Op Queue If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available°Normal data bus: data + destination (“go to” bus)°Common data bus: data + source (“come from” bus)•64 bits of data + 4 bits of Functional Unit source address•Write if matches expected Functional Unit (produces result)•Does the broadcast11/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.8Instruction status:Read Exec Write Exec WriteInstruction j kIssue OperCompResult IssueComplResultLD F6 34+ R2 1 2 3 4 1 3 4LD F2 45+ R3 5 6 7 8 2 4 5MULTD F0 F2 F4 6 9 19 20 3 15 16SUBD F8 F6 F2 7 9 11 12 4 7 8DIVD F10 F0 F6 8 21 61 62 5 56 57ADDD F6 F8 F2 13 14 16 22 6 10 11•In-order issue•Out-of-order execution•Out-of-order Completion  Problem with precise Interrupts!Recall: Comparison of two techniques11/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.9Tomasulo Loop ExampleLoop: LD F0 0 R1MULTD F4 F0 F2SD F4 0 R1SUBI R1 R1 #8BNEZ R1 Loop°Assume Multiply takes 4 clocks°Assume first load takes 8 clocks (cache miss), second load takes 1 clock (hit)°To be clear, will show clocks for SUBI, BNEZ°Reality: integer instructions ahead11/02/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.10Loop ExampleInstruction status: ExecWriteITER Instruction j k IssueCompResultBusy AddrFu1 LD F0 0 R1 Load1 No1 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 NoReservation Stations:S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1


View Full Document

Berkeley COMPSCI 152 - Lecture 18

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 18
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 18 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 18 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?