DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 12 - Complex Pipelines

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architecture andEngineering Lecture 12 - Complex PipelinesKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California at Berkeleyhttp://www.eecs.berkeley.ed u/~krstehttp://inst.eecs.berkeley.e du/~cs1523/10/2009 CS152-Spring!092Last time in Lecture 11• Modern page-based virtual memory systems provide:– Translation, Protection, Virtual memory.• Translation and protection information stored in pagetables, held in main memory• Translation and protection information cached in“translation lookaside buffer” (TLB) to provide singlecycle translation+protection check in common case• VM interacts with cache design– Physical cache tags require address translation before taglookup, or use untranslated offset bits to index cache.– Virtual tags do not require translation before cache hit/missdetermination, but need to be flushed or extended with ASID tocope with context swaps. Also, must deal with virtual addressaliases (usually by disallowing copies in cache).3/10/2009 CS152-Spring!093Complex Pipelining: MotivationPipelining becomes complex when we wanthigh performance in the presence of:• Long latency or partially pipelined floating-point units• Memory systems with variable access time• Multiple arithmetic and memory units3/10/2009 CS152-Spring!094Floating-Point Unit (FPU)Much more hardware than an integer unitSingle-cycle FPU is a bad idea - why?• it is common to have several FPU’s• it is common to have different types of FPU’s Fadd, Fmul, Fdiv, ...• an FPU may be pipelined, partially pipelined ornot pipelinedTo operate several FPU’s concurrently the FP registerfile needs to have more read and write ports3/10/2009 CS152-Spring!095Functional Unit CharacteristicsfullypipelinedpartiallypipelinedFunctional units have internal pipeline registers! operands are latched when an instruction enters a functional unit ! inputs to a functional unit (e.g., register file) can change during a long latency operation1cyc1cyc 1cyc2 cyc 2 cyc3/10/2009 CS152-Spring!096Floating-Point ISAInteraction between the floating-point datapathand the integer datapath is determined largelyby the ISAMIPS ISA• separate register files for FP and Integer instructionsthe only interaction is via a set of moveinstructions (some ISA’s don’t even permit this)• separate load/store for FPR’s and GPR’s but both use GPR’s for address calculation• separate conditions for branchesFP branches are defined in terms of conditioncodes3/10/2009 CS152-Spring!097Realistic Memory SystemsLatency of access to the main memory is usuallymuch greater than one cycle and often unpredictableSolving this problem is a central issue incomputer architectureCommon approaches to improving memoryperformance• separate instruction and data memory ports! self-modifying code might need explicit cache flush• cachessingle cycle except in case of a miss ! stall• interleaved memorymultiple memory accesses ! bank conflicts• split-phase memory operations! out-of-order responses3/10/2009 CS152-Spring!098Multiple Functional Units in PipelineIF ID WBALU MemFaddFmulFdivIssueGPR’sFPR’s3/10/2009 CS152-Spring!099Complex Pipeline Control Issues• Structural conflicts at the execution stage if some FPU or memory unit is not pipelined and takes more than one cycle• Structural conflicts at the write-back stage due to variable latencies of different functional units• Out-of-order write hazards due to variable latencies of different functional units• How to handle exceptions?3/10/2009 CS152-Spring!0910Complex In-Order PipelineDelay writeback so alloperations have samelatency to W stage– Write ports neveroversubscribed (one inst. in &one inst. out every cycle)– Stall pipeline on long latencyoperations, e.g., divides, cachemisses– Handle exceptions in-order atcommit pointCommitPointPCInst.MemDDecodeX1 X2DataMemW+GPRsX2 WFAddX3X3FPRsX1X2FMulX3X2FDiv X3UnpipelineddividerHow to prevent increased writebacklatency from slowing down singlecycle integer operations?3/10/2009 CS152-Spring!0911In-Order Superscalar Pipeline• Fetch two instructions per cycle;issue both simultaneously if oneis integer/memory and other isfloating point• Inexpensive way of increasingthroughput, examples includeAlpha 21064 (1992) & MIPSR5000 series (1996)• Same idea can be extended towider issue by duplicatingfunctional units (e.g. 4-issueUltraSPARC) but regfile portsand bypassing costs growquicklyCommitPoint2PCInst.MemDDualDecodeX1 X2DataMemW+GPRsX2 WFAddX3X3FPRsX1X2FMulX3X2FDiv X3Unpipelineddivider3/10/2009 CS152-Spring!0912Types of Data HazardsConsider executing a sequence ofrk " ri op rjtype of instructionsData-dependencer3 " r1 op r2 Read-after-Writer5 " r3 op r4(RAW) hazardAnti-dependencer3 " r1 op r2Write-after-Read r1 " r4 op r5(WAR) hazardOutput-dependencer3 " r1 op r2 Write-after-Write r3 " r6 op r7 (WAW) hazard3/10/2009 CS152-Spring!0913Register vs. Memory DependenceData hazards due to register operands can bedetermined at the decode stage butdata hazards due to memory operands can bedetermined only after computing the effective addressstore M[r1 + disp1] " r2 load r3 " M[r4 + disp2]Does (r1 + disp1) = (r4 + disp2) ?3/10/2009 CS152-Spring!0914Data Hazards: An ExampleI1 DIVD f6, f6, f4I2 LD f2, 45(r3)I3 MULTD f0, f2, f4I4 DIVD f8, f6, f2I5SUBD f10, f0, f6I6 ADDD f6, f8, f2RAW HazardsWAR HazardsWAW Hazards3/10/2009 CS152-Spring!0915Instruction SchedulingI6I2I4I1I5I3Valid orderings:in-order I1 I2 I3 I4 I5I6out-of-orderout-of-orderI1 DIVD f6, f6, f4I2 LD f2, 45(r3)I3 MULTD f0, f2, f4I4 DIVD f8, f6, f2I5SUBD f10, f0, f6I6 ADDD f6, f8, f2I2 I1 I3 I4 I5I6I1 I2I3 I5 I4I63/10/2009 CS152-Spring!0916Out-of-order CompletionIn-order Issue LatencyI1 DIVD f6, f6, f4 4I2LD f2, 45(r3) 1I3MULTD f0, f2, f4 3I4DIVD f8, f6, f2 4I5SUBD f10, f0, f6 1I6ADDD f6, f8, f2 1in-order comp 1 2out-of-order comp 1 21 2 3 4 3 5 4 6 5 62 3 1 4 3 5 5 4 6 63/10/2009 CS152-Spring!0917CDC 6600 Seymour Cray, 1963• A fast pipelined machine with 60-bit words– 128 Kword main memory capacity, 32 banks• Ten functional units (parallel, unpipelined)– Floating Point: adder, 2 multipliers, divider– Integer: adder, 2 incrementers, ...• Hardwired control (no microcoding)• Scoreboard for dynamic scheduling of instructions• Ten Peripheral Processors for Input/Output– a fast multi-threaded 12-bit integer ALU• Very fast


View Full Document

Berkeley COMPSCI 152 - Lecture 12 - Complex Pipelines

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 12 - Complex Pipelines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 12 - Complex Pipelines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 12 - Complex Pipelines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?