DOC PREVIEW
UCD ECS 201A - Lecture 3- Tomasulo Algorithm, Dynamic Branch Prediction, VLIW

This preview shows page 1-2-3-4-5-6-7-52-53-54-55-56-57-58-59-105-106-107-108-109-110-111 out of 111 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 111 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 3: Tomasulo Algorithm, Dynamic Branch Prediction, VLIW, Software Pipelining, and Limits to ILPAssignmentsReview: SummaryReview: Three Parts of the ScoreboardReview: Scoreboard Example Cycle 3Review: Scoreboard Example Cycle 9Review: Scoreboard Example Cycle 17Review: Scoreboard Example Cycle 62Review: Scoreboard SummaryAnother Dynamic Algorithm: Tomasulo AlgorithmTomasulo Algorithm vs. ScoreboardTomasulo OrganizationReservation Station ComponentsThree Stages of Tomasulo AlgorithmTomasulo Example Cycle 0Tomasulo Example Cycle 1Tomasulo Example Cycle 2Tomasulo Example Cycle 3Tomasulo Example Cycle 4Tomasulo Example Cycle 5Tomasulo Example Cycle 6Tomasulo Example Cycle 7Tomasulo Example Cycle 8Tomasulo Example Cycle 9Tomasulo Example Cycle 10Tomasulo Example Cycle 11Tomasulo Example Cycle 12Tomasulo Example Cycle 13Tomasulo Example Cycle 14Tomasulo Example Cycle 15Tomasulo Example Cycle 16Tomasulo Example Cycle 55Tomasulo Example Cycle 56Tomasulo Example Cycle 57Compare to Scoreboard Cycle 62Tomasulo v. Scoreboard (IBM 360/91 v. CDC 6600)Tomasulo DrawbacksTomasulo Loop ExampleLoop Example Cycle 0Loop Example Cycle 1Loop Example Cycle 2Loop Example Cycle 3Loop Example Cycle 4Loop Example Cycle 5Loop Example Cycle 6Loop Example Cycle 7Loop Example Cycle 8Loop Example Cycle 9Loop Example Cycle 10Loop Example Cycle 11Loop Example Cycle 12Loop Example Cycle 13Loop Example Cycle 14Loop Example Cycle 15Loop Example Cycle 16Loop Example Cycle 17Loop Example Cycle 18Loop Example Cycle 19Loop Example Cycle 20Loop Example Cycle 21Tomasulo SummaryDynamic Branch PredictionSlide 63BHT AccuracyCorrelating BranchesSlide 66Accuracy of Different Schemes (Figure 4.21, p. 272)Re-evaluating CorrelationNeed Address at Same Time as PredictionHW support for More ILPDynamic Branch Prediction SummarySlide 72Slide 73Four Steps of Speculative Tomasulo AlgorithmRenaming RegistersDynamic Scheduling in PowerPC 604 and Pentium ProSlide 77Dynamic Scheduling in Pentium ProGetting CPI < 1: Issuing Multiple Instructions/CycleSlide 80Review: Unrolled Loop that Minimizes Stalls for ScalarLoop Unrolling in SuperscalarMultiple Issue ChallengesLoop Unrolling in VLIWTrace SchedulingAdvantages of HW (Tomasulo) vs. SW (VLIW) SpeculationSuperscalar v. VLIWIntel/HP “Explicitly Parallel Instruction Computer (EPIC)”Dynamic Scheduling in SuperscalarSlide 90Performance of Dynamic SSSoftware PipeliningSoftware Pipelining ExampleLimits to Multi-Issue MachinesSlide 95Limits to ILPSlide 97Upper Limit to ILP: Ideal Machine (Figure 4.38, page 319)More Realistic HW: Branch Impact Figure 4.40, Page 323Selective History PredictorMore Realistic HW: Register Impact Figure 4.44, Page 328More Realistic HW: Alias Impact Figure 4.46, Page 330Realistic HW for ‘9X: Window Impact (Figure 4.48, Page 332)Braniac vs. Speed Demon(1993)3 1996 Era MachinesSPECint95base Performance (July 1996)SPECfp95base Performance (July 1996)3 1997 Era MachinesSPECint95base Performance (Oct. 1997)SPECfp95base Performance (Oct. 1997)SummaryFTC.W99 1Lecture 3: Tomasulo Algorithm, Dynamic Branch Prediction, VLIW, Software Pipelining, and Limits to ILPProf. Fred ChongECS 250A Computer ArchitectureWinter 1999(Adapted from Patterson CS252 Copyright 1998 UCB)FTC.W99 2Assignments•Read Ch 5•Problem Set 3 out on Wed•Problem Set 2 back soon•Proposal comments by e-mail soonFTC.W99 3Review: Summary•Instruction Level Parallelism (ILP) in SW or HW•Loop level parallelism is easiest to see•SW parallelism dependencies defined for program, hazards if HW cannot resolve•SW dependencies/compiler sophistication determine if compiler can unroll loops–Memory dependencies hardest to determine•HW exploiting ILP–Works when can’t know dependence at run time–Code for one machine runs well on another•Key idea of Scoreboard: Allow instructions behind stall to proceed (Decode => Issue instr & read operands)–Enables out-of-order execution => out-of-order completionFTC.W99 4Review: Three Parts of the Scoreboard1.Instruction status—which of 4 steps the instruction is in2.Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unitBusy—Indicates whether the unit is busy or notOp—Operation to perform in the unit (e.g., + or –)Fi—Destination registerFj, Fk—Source-register numbersQj, Qk—Functional units producing source registers Fj, FkRj, Rk—Flags indicating when Fj, Fk are ready3.Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that registerFTC.W99 5Review: Scoreboard Example Cycle 3Instruction status ReadExecutionWriteInstructionj kIssueoperandscompleteResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit statusdest S1 S2FU for jFU for kFj? Fk?TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide NoRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Integer• Issue MULT? No, stall on structural hazardFTC.W99 6Review: Scoreboard Example Cycle 9Instruction status ReadExecutionWriteInstructionj kIssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit statusdest S1 S2FU for jFU for kFj? Fk?TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No10 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No2 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No YesRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 Add Divide• Read operands for MULT & SUBD? Issue ADDD?FTC.W99 7Review: Scoreboard Example Cycle 17Instruction status ReadExecutionWriteInstructionj kIssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit statusdest S1 S2FU for jFU for kFj? Fk?TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No2 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No YesRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3017 FU Mult1 Add Divide• Write result of ADDD? No, WAR hazardFTC.W99 8Review: Scoreboard Example Cycle 62Instruction status ReadExecutionWriteInstructionj kIssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDDF6 F8 F2 13 14 16 22Functional unit statusdest S1 S2FU for jFU


View Full Document

UCD ECS 201A - Lecture 3- Tomasulo Algorithm, Dynamic Branch Prediction, VLIW

Download Lecture 3- Tomasulo Algorithm, Dynamic Branch Prediction, VLIW
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 3- Tomasulo Algorithm, Dynamic Branch Prediction, VLIW and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 3- Tomasulo Algorithm, Dynamic Branch Prediction, VLIW 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?