DOC PREVIEW
Berkeley COMPSCI 252 - Lec 5 – Out-of-Order Completion

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EECS 252 Graduate Computer Architecture Lec 5 – Out-of-Order CompletionReviewOutlinePipelining with Reg. ReservationsResolving Structural HazardsBasic Issue ModelHazard ResolutionExampleCray-1 DiscussionPipelining with ScoreboardingScoreboard OperationSlide 12DiscussionCase Study: MIPS R4000 (200 MHz)Case Study: MIPS R4000MIPS R4000 Floating PointMIPS FP Pipe StagesR4000 PerformanceAdvanced Pipelining and Instruction Level Parallelism (ILP)Can we make CPI closer to 1?FP Loop: Where are the Hazards?FP Loop Showing StallsRevised FP Loop Minimizing StallsUnroll Loop Four Times (straightforward way)Unrolled Loop That Minimizes StallsGetting CPI < 1: Issuing Multiple Instructions/CycleSuperScalar Issue RulesLoop Unrolling in SuperscalarVLIW: Very Large Instruction WordLoop Unrolling in VLIWSummaryEECS 252 Graduate Computer Architecture Lec 5 – Out-of-Order Completion David CullerElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~cullerhttp://www-inst.eecs.berkeley.edu/~cs2522/1/2005CS252 SP05, Lec 5 OOC2Review•Data stationary pipeline control–Micro-instruction & PC track down the pipe–Accumulate state•Implementing bubbles, stalls, forwarding, multicycle operations•Branch prediction–Static vs dynamic–N-bit saturating counters–Local and global history–Correlated predictors, Tournament, GSHARE–Branch target buffers, return address predictors2/1/2005CS252 SP05, Lec 5 OOC3Outline•Relax pipeline design to allow out-of-order completions–Cray-1: register reservations•Relax pipeline to allow out-of-order issue–CDC 6600: Scoreboard•Compiler optimizations for ILP•Superscalar issue•Maybe Go back and finish exceptions2/1/2005CS252 SP05, Lec 5 OOC4Pipelining with Reg. Reservations•Assumptions1. Multiple pipelined function units of different latency»able to accept operations at issue rate»may be exceptions (e.g., divide)2. Issue instructions in order3. Operand fetch in order4. Completion out of order»short ops may bypass long ones5. Some shared resources (e.g., reg write port)•Implications–WAR hazard still resolved by pipeline flow (2 & 3)–RAW, WAW, and structural still present•Design philosophy (ala Cray)–Resolve hazards as instruction is issued into pipeline–Pipeline is non-blocking2/1/2005CS252 SP05, Lec 5 OOC5Resolving Structural Hazards•With static pipeline flow, resource usage is known in advance•Instruction requires X at t ticks after issue•If reservationX[t] is clear, issue inst and set bit•Otherwise, delay till clear•At each tick the reservationX[] shifts by one, so will eventually clear•Multiple resources? Range of delays?“shift reg.” for resource XNOWDelay till resource is usedrequired resource2/1/2005CS252 SP05, Lec 5 OOC6Basic Issue Model•Issue unit checks for all hazards–Structural RAW, WAW•Holds issue while hazards exist•Upon issue, register values provided to F.U•Executes to completion without blocking Instr. FetchOp Fetch & IssuerDvalA valBop2/1/2005CS252 SP05, Lec 5 OOC7Hazard Resolution•Structural–Op code => resource usage–Check resource resv–Set on issue•Data–Add reservation bit one each register–Check RegRsv for source and destination registers–Hold issue till clear–Set bit on destination register–Clear bit on dest reg. Write•Questions:–Forwarding?Instr. FetchOp Fetch & IssueMotorola 88000 “scoreboard” [sic]rDvalA valBop2/1/2005CS252 SP05, Lec 5 OOC8ExampleAdd r1 := r2 + r3Add r2 := r2 + 4Lod r5 := mem[r1+16]Lod r6 := mem[r1+32]Mul r7 := r5 * r6Bnz r1, fooSub r7 := r0 – r0Instr. FetchOp Fetch & IssuerDvalA valBop2/1/2005CS252 SP05, Lec 5 OOC9Cray-1 Discussion•Technological Assumptions•Why no forwarding?•Longevity of the ISA?•Instruction cache?–Four blocks (RR) of 16x4 “parcels”–Issue delayed on miss»2 CP for change of block•Branch delays?–Brach op code delayed till second parcel is obtained–5 clocks (reg zero, nz, pos, neg)•I/O system?2/1/2005CS252 SP05, Lec 5 OOC10Pipelining with Scoreboarding•Assumptions1. Multiple function units of different latency–Especially non-pipelined units2. Issue instructions whenever FU available, unless would cause multiple outstanding writes to same regsiter–Operand fetch out of order–Completion out of order3. Some shared resources (e.g., reg write port)•Implications–Need to resolve RAW, WAR, WAW and structural•Design philosophy (ala CDC 6600)–Issue unit tracks all outstanding dependences–Holds issue if structural or WAW hazard–Informs FUs when hazards resolved–FUs fetch operands from register file and proceed2/1/2005CS252 SP05, Lec 5 OOC11Scoreboard Operation•Issue–Hold while FU unavailable or destination register reserved (by FU f )•Read operands–SB informs FU with all sources available to fetch & go–Limited by read ports•Write back–SB schedules one FU to write–Waits no FU waiting to fetch (old version) of regInstr. FetchIssue & ResolveexrDrA rBoprDvalA valBopop fetchScoreboardop fetchFU2/1/2005CS252 SP05, Lec 5 OOC12ExampleInstr. FetchIssue & Resolveexop fetchScoreboardop fetchFUAdd r1 := r2 + r3Add r2 := r2 + 4Lod r5 := mem[r1+16]Lod r6 := mem[r1+32]Mul r7 := r5 * r6Bnz r1, fooSub r7 := r0 – r02/1/2005CS252 SP05, Lec 5 OOC13Discussion•Technological Assumptions•Extend to allow forwarding?•How do loads and stores work?•Instruction cache?•I/O system?2/1/2005CS252 SP05, Lec 5 OOC14Case Study: MIPS R4000 (200 MHz)•8 Stage Pipeline:–IF–first half of fetching of instruction; PC selection happens here as well as initiation of instruction cache access.–IS–second half of access to instruction cache. –RF–instruction decode and register fetch, hazard checking and also instruction cache hit detection.–EX–execution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation.–DF–data fetch, first half of access to data cache.–DS–second half of access to data cache.–TC–tag check, determine whether the data cache access hit.–WB–write back for loads and register-register operations.•8 Stages: What is impact on Load delay? Branch delay? Why?instr mem data memregALUregIF IS RFEXDF DS TC WB2/1/2005CS252 SP05, Lec 5 OOC15Case Study: MIPS R4000IF ISIFRFISIFEXRFISIFDFEXRFISIFDSDFEXRFISIFTCDSDFEXRFISIFWBTCDSDFEXRFISIFTWO CycleLoad LatencyIF ISIFRFISIFEXRFISIFDFEXRFISIFDSDFEXRFISIFTCDSDFEXRFISIFWBTCDSDFEXRFISIFTHREE CycleBranch


View Full Document

Berkeley COMPSCI 252 - Lec 5 – Out-of-Order Completion

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Lec 5 – Out-of-Order Completion
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lec 5 – Out-of-Order Completion and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lec 5 – Out-of-Order Completion 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?