Berkeley COMPSCI 152 - Lecture 13 Static Pipeline Scheduling Compiler Optimizations - D2924013

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> Lecture 13 Static Pipeline Scheduling Compiler Optimizations

DOC PREVIEW

Berkeley COMPSCI 152 - Lecture 13 Static Pipeline Scheduling Compiler Optimizations

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 45

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 13 Static Pipeline Scheduling Compiler OptimizationsRecall: Data Hazard Solution: ForwardingRecall: Resolve RAW by “forwarding” (or bypassing)FYI: MIPS R3000 clocking disciplineMIPS R3000 Instruction PipelineThus, only 2 levels of forwardingRecall: Examples of stalls/bubblesRecall: Freeze above & Bubble BelowRecall: Achieving Precise ExceptionsRecall: What about memory operations?MIPS R3000 Multicycle OperationsCase Study: MIPS R4000 (200 MHz)Case Study: MIPS R4000MIPS R4000 Floating PointMIPS FP Pipe StagesRecall: Compute CPI?R4000 PerformanceAdministriviaAdministrivia: Pentium-4 Architecture!Can we somehow make CPI closer to 1?FP Loop: Where are the Hazards?FP Loop Showing StallsRevised FP Loop Minimizing StallsUnroll Loop Four Times (straightforward way)Unrolled Loop That Minimizes StallsGetting CPI < 1: Issuing Multiple Instructions/CycleSlide 27Loop Unrolling in SuperscalarLimits of SuperscalarLoop Unrolling in VLIWSoftware PipeliningSoftware Pipelining ExampleSoftware Pipelining with Loop Unrolling in VLIWCan we use HW to get CPI closer to 1?Problems?The Big Picture: Where are We Now?Scoreboard: a bookkeeping techniqueScoreboard Architecture(CDC 6600)Scoreboard ImplicationsFour Stages of Scoreboard ControlSlide 41Three Parts of the ScoreboardCDC 6600 ScoreboardSummary #1/2: Compiler techniques for parallelismSummary #2/2CS152Computer Architecture and EngineeringLecture 13Static Pipeline SchedulingCompiler OptimizationsMarch 15, 2004John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.2• “Forward” result from one stage to another• “or” OK if define read/write properlyRecall: Data Hazard Solution: ForwardingInstr.OrderTime (clock cycles)add r1,r2,r3sub r4,r1,r3and r6,r1,r7or r8,r1,r9xor r10,r1,r11IFID/RFEX MEM WBALUImRegDmRegALUImRegDm RegALUImRegDm RegImALURegDm RegALUImRegDm Reg3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.3Recall: Resolve RAW by “forwarding” (or bypassing)°Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe•Increase muxes to add paths from pipeline registers•Data Forwarding = Data BypassingnpcI memRegsBaluSD memmIAUPCRegsAim op rwnop rwnop rwnop rw rs rtForwardmux3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.4FYI: MIPS R3000 clocking discipline°2-phase non-overlapping clocks°Pipeline stage is two (level sensitive) latchesphi1phi2phi1 phi1phi2Edge-triggered3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.5MIPS R3000 Instruction PipelineInst FetchDecodeReg. ReadALU / E.A Memory Write Reg TLB I-Cache RF Operation WB E.A. TLB D-CacheTLBI-cacheRFALUALUTLBD-CacheWBResource UsageWrite in phase 1, read in phase 2 => eliminates bypass from WB3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.6Thus, only 2 levels of forwardingInstr.OrderTime (clock cycles)add r1,r2,r3sub r4,r1,r3and r6,r1,r7or r8,r1,r9xor r10,r1,r11IFID/RFEX MEM WBALUImRegDmRegALUImRegDm RegALUImRegDm RegImALURegDm RegALUImRegDm RegWith MIPS R3000 pipeline, no need to forward from WB stage3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.7Recall: Examples of stalls/bubbles°Exceptions: Flush everything above •Prevent instructions following exception from commiting state•Freeze fetch until exception resolved°Stalls: Introduce brief stalls into pipeline•Decode stage recognizes that current instruction cannot proceed•Freeze fetch stage•Introduce “bubble” into EX stage (instead of forwarding stalled inst)•Can stall until condition is resolved•Examples:-mfhi, mflo: need to wait for multiply/divide unit to finish-“Break” instruction for Lab5: stall until release line received-Load delay slot handled this way as well3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.8Recall: Freeze above & Bubble Below°Flush accomplished by setting “invalid” bit in pipelinenpcI memRegsBaluSD memmIAUPCRegsAim op rwnop rwnop rwnop rw rs rtbubblefreeze3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.9Recall: Achieving Precise Exceptions°Use pipeline to sort this out!•Pass exception status along with instruction.•Keep track of PCs for every instruction in pipeline.•Don’t act on exception until it reach WB stage°Handle interrupts through “faulting noop” in IF stage°When instruction reaches end of MEM stage:•Save PC  EPC, Interrupt vector addr  PC•Turn all instructions in earlier stages into noops!Program FlowTimeIFetch Dcd Exec Mem WBIFetch Dcd Exec Mem WBIFetch Dcd Exec Mem WBIFetch Dcd Exec Mem WBData TLBBad InstInst TLB faultOverflow3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.10Recall: What about memory operations?A Bop Rd Ra Rbop Rd Ra Rb Rd to regfileR Rd ºIf instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations!ºWhat about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 “Delayed Loads”ºCan recognize this in decode stage and introduce bubble while stalling fetch stage (hint for lab 4!)ºTricky situation: R1 <- Mem[ R2 + I ] Mem[R3+34] <- R1 Handle with bypass in memory stage!DMemT3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.11MIPS R3000 Multicycle OperationsEx: Multiply, Divide, Cache MissUse control word of local stage to step through multicycle operationStall all stages above multicycle operation in the pipelineDrain (bubble) stages below itAlternatively, launch multiply/divide to autonomous unit, only stall pipe if attempt to get result before ready - This means stall mflo/mfhi in decode stage if multiply/divide still executing - Extra credit in Lab 5 does thisA Bop Rd Ra Rbmul Rd Ra Rb Rd to regfileRT Rd3/15/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec13.12Case Study: MIPS R4000 (200 MHz)°8 Stage Pipeline:•IF–first half of fetching of instruction; PC selection happens here as well as initiation of instruction cache access.•IS–second half of access to instruction cache. •RF–instruction decode and register fetch, hazard checking and also instruction cache hit detection.•EX–execution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation.•DF–data fetch, first half of access to data cache.•DS–second half of access to

View Full Document

Berkeley COMPSCI 152 - Lecture 13 Static Pipeline Scheduling Compiler Optimizations

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Berkeley COMPSCI 152 - Lecture 13 Static Pipeline Scheduling Compiler Optimizations

Sign up for free to view:

Please select your school