DOC PREVIEW
Berkeley COMPSCI 152 - Introduction to Advanced Pipelining

This preview shows page 1-2-3-4-5-33-34-35-36-66-67-68-69-70 out of 70 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 70 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 14 Introduction to Advanced PipeliningReview: Summary of Pipelining BasicsRecap: Pipeline HazardsRecap: Data HazardsRecap: Pipelined Processor for slidesRecap: Data Stationary ControlThe Big Picture: Where are We Now?Recap: Record of Pending WritesRecap: Resolve RAW by forwardingWhat about memory operations?Compiler Avoiding Load Stalls:What about Interrupts, Traps, Faults?Exception ProblemAnother look at the exception problemException HandlingResolution: Freeze above & Bubble BelowAdministriviaFYI: MIPS R3000 clocking disciplineMIPS R3000 Instruction PipelineRecall: Data Hazard on r1MIPS R3000 Multicycle OperationsIs CPI = 1 for our pipeline?Case Study: MIPS R4000 (200 MHz)Case Study: MIPS R4000MIPS R4000 Floating PointMIPS FP Pipe StagesR4000 PerformanceAdvanced Pipelining and Instruction Level Parallelism (ILP)FP Loop: Where are the Hazards?FP Loop Showing StallsRevised FP Loop Minimizing StallsUnroll Loop Four Times (straightforward way)Unrolled Loop That Minimizes StallsGetting CPI < 1: Issuing Multiple Instructions/CycleSlide 35Unrolled Loop that Minimizes Stalls for ScalarLoop Unrolling in SuperscalarLimits of SuperscalarLoop Unrolling in VLIWSoftware PipeliningSoftware Pipelining ExampleSoftware Pipelining with Loop Unrolling in VLIWTrace SchedulingHW Schemes: Instruction ParallelismSlide 45Scoreboard ImplicationsPerformance of Dynamic SSPrediction: Branches, Dependencies, Data New era in computing?Dynamic Branch PredictionNeed Address at Same Time as PredictionSlide 51Slide 52BHT AccuracyCorrelating BranchesSlide 55Accuracy of Different SchemesDynamic Branch Prediction SummaryHW support for More ILPSlide 59Slide 60Dynamic Scheduling in PowerPC 604 and Pentium ProSlide 62Dynamic Scheduling in Pentium ProLimits to Multi-Issue MachinesSlide 65Braniac vs. Speed Demon3 Recent MachinesSPECint95base Performance (Oct. 1997)SPECfp95base Performance (Oct. 1997)Summary3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.1CS152Computer Architecture and EngineeringLecture 14Introduction to Advanced PipeliningMarch 17, 1999John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.2Review: Summary of Pipelining Basics°Pipelines pass control information down the pipe just as data moves down pipe°Forwarding/Stalls handled by local control°Hazards limit performance•Structural: need more HW resources•Data: need forwarding, compiler scheduling•Control: early evaluation & PC, delayed branch, prediction°Increasing length of pipe increases impact of hazards; pipelining helps instruction bandwidth, not latency3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.3Recap: Pipeline HazardsI-Fet ch DCD MemOpFetch OpFetch Exec StoreIFetch DCD ° ° °StructuralHazardI-Fet ch DCD OpFetch JumpIFetch DCD ° ° °Control Hazard IF DCD EX Mem WB IF DCD OF Ex MemRAW (read after write) Data HazardWAW Data Hazard (write after write) IF DCD OF Ex RS WAR Data Hazard (write after read) IF DCD EX Mem WB IF DCD EX Mem WB3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.4Recap: Data Hazards°Avoid some “by design”•eliminate WAR by always fetching operands early (DCD) in pipe•eleminate WAW by doing all WBs in order (last stage, static)°Detect and resolve remaining ones•stall or forward (if possible) IF DCD EX Mem WB IF DCD OF Ex MemRAW Data HazardWAW Data Hazard IF DCD OF Ex RS RAW Data Hazard IF DCD EX Mem WB IF DCD EX Mem WB3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.5Recap: Pipelined Processor for slides°Separate control at each stage°Stalls propagate backwards to freeze previous stages°Bubbles in pipeline introduced by placing “Noops” into local stage, stall previous stages.ExecReg. FileMemAccessDataMemABSMRegFileEqualPCNext PCIRInst. MemValidIRexDcd CtrlIRmemEx CtrlIRwbMem CtrlWB CtrlDStallsBubbles3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.6Recap: Data Stationary Control°The Main Control generates the control signals during Reg/Dec•Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later•Control signals for Mem (MemWr Branch) are used 2 cycles later•Control signals for Wr (MemtoReg MemWr) are used 3 cycles laterIF/ID RegisterID/Ex RegisterEx/Mem RegisterMem/Wr RegisterReg/Dec Exec MemExtOpALUOpRegDstALUSrcBranchMemWrMemtoRegRegWrMainControlExtOpALUOpRegDstALUSrcMemtoRegRegWrMemtoRegRegWrMemtoRegRegWrBranchMemWrBranchMemWrWr3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.7°The Five Classic Components of a Computer°Today’s Topics: •Recap last lecture•Review MIPS R3000 pipeline•Administrivia•Advanced Pipelining•SuperScalar, VLIW/EPICThe Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.8Recap: Record of Pending Writes°Current operand registers°Pending writes°hazard <=((rs == rwex) & regWex) OR((rs == rwmem) & regWme) OR((rs == rwwb) & regWwb) OR((rt == rwex) & regWex) OR((rt == rwmem) & regWme) OR((rt == rwwb) & regWwb) npcI memRegsBaluSD memmIAUPCRegsAim op rwnop rwnop rwnop rw rs rt3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.9Recap: Resolve RAW by forwarding°Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe•Increase muxes to add paths from pipeline registers•Data Forwarding = Data BypassingnpcI memRegsBaluSD memmIAUPCRegsAim op rwnop rwnop rwnop rw rs rtForwardmux3/17/99 ©UCB Spring 1999CS152 / Kubiatowicz Lec14.10What about memory operations?A Bop Rd Ra Rbop Rd Ra Rb Rd to regfileR Rd ºIf instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations!º What does delaying WB on arithmetic operations cost? – cycles ? – hardware ?ºWhat about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 “Delayed Loads”ºCan recognize this in decode stage and introduce bubble while stalling fetch stage (hint for lab 5!)ºTricky situation: R1 <- Mem[ R2 + I ] Mem[R3+34] <- R1 Handle with bypass in memory


View Full Document

Berkeley COMPSCI 152 - Introduction to Advanced Pipelining

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Introduction to Advanced Pipelining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Advanced Pipelining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Advanced Pipelining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?