Berkeley COMPSCI 252 - Lecture 16: Instruction Level Parallelism and Dynamic Execution - D2411674

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 252> Lecture 16: Instruction Level Parallelism and Dynamic Execution

DOC PREVIEW

Berkeley COMPSCI 252 - Lecture 16: Instruction Level Parallelism and Dynamic Execution

School name University of California, Berkeley

Course Compsci 252- Graduate Computer Architecture

Pages 74

This preview shows page 1-2-3-4-5-35-36-37-38-39-70-71-72-73-74 out of 74 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 74 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS252 Graduate Computer Architecture Lecture 16: Instruction Level Parallelism and Dynamic Execution #1:Recall from Pipelining ReviewIdeas to Reduce StallsInstruction-Level Parallelism (ILP)Data Dependence and HazardsSlide 6Name Dependence #1: Anti-dependenceName Dependence #2: Output dependenceILP and Data HazardsControl DependenciesControl Dependence IgnoredException BehaviorData FlowCS 252 AdministriviaAdvantages of Dynamic SchedulingHW Schemes: Instruction ParallelismDynamic Scheduling Step 1A Dynamic Algorithm: Tomasulo’s AlgorithmTomasulo AlgorithmTomasulo OrganizationReservation Station ComponentsThree Stages of Tomasulo AlgorithmTomasulo ExampleTomasulo Example Cycle 1Tomasulo Example Cycle 2Tomasulo Example Cycle 3Tomasulo Example Cycle 4Tomasulo Example Cycle 5Tomasulo Example Cycle 6Tomasulo Example Cycle 7Tomasulo Example Cycle 8Tomasulo Example Cycle 9Tomasulo Example Cycle 10Tomasulo Example Cycle 11Tomasulo Example Cycle 12Tomasulo Example Cycle 13Tomasulo Example Cycle 14Tomasulo Example Cycle 15Tomasulo Example Cycle 16Faster than light computation (skip a couple of cycles)Tomasulo Example Cycle 55Tomasulo Example Cycle 56Tomasulo Example Cycle 57Tomasulo DrawbacksTomasulo Loop ExampleLoop ExampleLoop Example Cycle 1Loop Example Cycle 2Loop Example Cycle 3Loop Example Cycle 4Loop Example Cycle 5Loop Example Cycle 6Loop Example Cycle 7Loop Example Cycle 8Loop Example Cycle 9Loop Example Cycle 10Loop Example Cycle 11Loop Example Cycle 12Loop Example Cycle 13Loop Example Cycle 14Loop Example Cycle 15Loop Example Cycle 16Loop Example Cycle 17Loop Example Cycle 18Loop Example Cycle 19Loop Example Cycle 20Why can Tomasulo overlap iterations of loops?Tomasulo’s scheme offers 2 major advantagesWhat about Precise Interrupts?Relationship between precise interrupts and specultation:HW support for precise interruptsFour Steps of Speculative Tomasulo AlgorithmWhat are the hardware complexities with reorder buffer (ROB)?SummaryCS252/PattersonLec 16.13/16/01CS252Graduate Computer ArchitectureLecture 16: Instruction Level Parallelism and Dynamic Execution #1:March 16, 2001Prof. David A. PattersonComputer Science 252Spring 2001CS252/PattersonLec 16.23/16/01Recall from Pipelining Review•Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls–Ideal pipeline CPI: measure of the maximum performance attainable by the implementation–Structural hazards: HW cannot support this combination of instructions–Data hazards: Instruction depends on result of prior instruction still in the pipeline–Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps)CS252/PattersonLec 16.33/16/01Ideas to Reduce StallsTechnique ReducesDynamic scheduling Data hazard stallsDynamic branchpredictionControl stallsIss uing multipleinstructions per cycleI deal CPISpeculation Data and control stallsDynamic memorydisambiguationData hazard stalls involvingmemoryLoop unrolling Control hazard stallsBasic compiler pipelineschedulingData hazard stallsCompiler dependenceanalysisI deal CPI and data hazard stallsSof tware pipelining andtrace schedulingI deal CPI and data hazard stallsCompiler speculation I deal CPI, data and control stallsChapter 3Chapter 4CS252/PattersonLec 16.43/16/01Instruction-Level Parallelism (ILP)•Basic Block (BB) ILP is quite small–BB: a straight-line code sequence with no branches in except to the entry and no branches out except at the exit–average dynamic branch frequency 15% to 25% => 4 to 7 instructions execute between a pair of branches–Plus instructions in BB likely to depend on each other•To obtain substantial performance enhancements, we must exploit ILP across multiple basic blocks•Simplest: loop-level parallelism to exploit parallelism among iterations of a loop–Vector is one way–If not vector, then either dynamic via branch prediction or static via loop unrolling by compilerCS252/PattersonLec 16.53/16/01•InstrJ is data dependent on InstrI InstrJ tries to read operand before InstrI writes it•or InstrJ is data dependent on InstrK which is dependent on InstrI•Caused by a “True Dependence” (compiler term) •If true dependence caused a hazard in the pipeline, called a Read After Write (RAW) hazard Data Dependence and HazardsI: add r1,r2,r3J: sub r4,r1,r3CS252/PattersonLec 16.63/16/01•Dependences are a property of programs•Presence of dependence indicates potential for a hazard, but actual hazard and length of any stall is a property of the pipeline•Importance of the data dependencies1) indicates the possibility of a hazard2) determines order in which results must be calculated3) sets an upper bound on how much parallelism can possibly be exploited•Today looking at HW schemes to avoid hazardData Dependence and HazardsCS252/PattersonLec 16.73/16/01•Name dependence: when 2 instructions use same register or memory location, called a name, but no flow of data between the instructions associated with that name; 2 versions of name dependence•InstrJ writes operand before InstrI reads itCalled an “anti-dependence” by compiler writers.This results from reuse of the name “r1”•If anti-dependence caused a hazard in the pipeline, called a Write After Read (WAR) hazardI: sub r4,r1,r3 J: add r1,r2,r3K: mul r6,r1,r7Name Dependence #1: Anti-dependenceCS252/PattersonLec 16.83/16/01Name Dependence #2: Output dependence•InstrJ writes operand before InstrI writes it.•Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”•If anti-dependence caused a hazard in the pipeline, called a Write After Write (WAW) hazardI: sub r1,r4,r3 J: add r1,r2,r3K: mul r6,r1,r7CS252/PattersonLec 16.93/16/01ILP and Data Hazards•HW/SW must preserve program order: order instructions would execute in if executed sequentially 1 at a time as determined by original source program•HW/SW goal: exploit parallelism by preserving program order only where it affects the outcome of the program•Instructions involved in a name dependence can execute simultaneously if name used in instructions is changed so instructions do not conflict–Register renaming resolves name dependence for regs–Either by compiler or by HWCS252/PattersonLec 16.103/16/01Control Dependencies•Every instruction is control dependent on some set of branches, and, in general, these control dependencies must be preserved to preserve program

View Full Document