Unformatted text preview:

Page 1 1 CS6810 School of Computing University of Utah Pipelines Today’s topics: • Evidence suggests there is some rust on this topic • hence spend a week and move on • also need some common terminology • Attempt to present the ideal issues • with some discussion on why ideal isn’t reality. 2 CS6810 School of Computing University of Utah Pipelining • Computational assembly line  each step does a small fraction 1/pipeline_depth of the job  concurrent exectuion of pipeline_dept instructions » performance is all about parallelism • Vertical vs. Horizontal concurrency • Pipeline stage – 1 step in an N step pipe  1 cycle per stage » synchronous design – slowest stage set clock rate » laminar is the target • Simple modelPage 2 3 CS6810 School of Computing University of Utah Pipeline Benefit = Performance • Ideal performance  time-per-instruction = unpiped_instruction time/#stages » asymptotic – overheads count • +10% typically achieved • 2 ways to view this performance enhancement  logical » work on several instructions at once • albeit in different stages of their execution » parallelism • average IPC reduced  physical » shorter stages = increased frequency 4 CS6810 School of Computing University of Utah Other Pipeline Benefits • HW mechanism  hidden from the SW so invisible to the user  just viewed as a benefit • No programming impact  unless user needs the ultimate in performance  usually left up to compiler scheduling & optimization • Pipelines are everywhere  key keep on Moore’s law curve in the 80’s  90’s just moved to multiple pipelines  frequency wars » push pipeline depth to lunatic fringe • problems – power α frequency – overheads make ideal performance a bit optimisticPage 3 5 CS6810 School of Computing University of Utah Consider MIPS64 • 5 steps in instruction execution  fetch, decode, execute, mem, write-back • Remember the ISA 6 CS6810 School of Computing University of Utah Stages vary by Instructions • Stage 3  Xeq reg-reg or calculate effective address or branch target » for any instruction • only one role • Stage 4  only active on Load/Store/Jump/Branch » LMD  Mem[ALUoutput] » Mem[ALUoutput]  SMD » next PC = ALUoutput w/ condition • JUMP – no condition • Stage 5  Reg-Reg » Regs[IR16..20]  ALUoutput  Reg – Immediate » Regs[IR11..15]  ALUoutput  Load » Regs[IR11..15]  memory data returnPage 4 7 CS6810 School of Computing University of Utah Example 5-stage Data-path 8 CS6810 School of Computing University of Utah Inter_Stage Registers • Pre-IF  Next PC • IF:ID  PC+4  IR: opcode, RS1, RS2, RD, imm16, function  Wbmux value • ID:EX  PC+4  IR1: Amux_sel, Bmux_sel, ALUop, Wbmux_sel, R/Wmem, Mmux_sel  immediate data: 16 or 26 bits • EX:Mem  ALUout, SMD, mux selector indices, R vs. W command • M:WB  ALUout, LMDPage 5 9 CS6810 School of Computing University of Utah How real was that? • Depends  real for simple architectures » woefully over simplified for higher performance architectures  not optimized » 2 ALU’s • IF and EX – but ALU’s are cheap so who cares? » Harvard architecture • separate instruction and data memories – typical at L1 – but unified below that » 5x frequency for five stages • slowed down by inter-stage register overhea • Data-path is only part of the architecture  largest bit in terms of area  easiest bit in terms of getting it right  control path » FSM or microcode or both? 10 CS6810 School of Computing University of Utah Control vs. Data Example • Look at a few typical componentsPage 6 11 CS6810 School of Computing University of Utah Control Path • Each component has control points  register: load or output enable  mux/demux: select lines  memory: R vs. W  XU – optcode • What vs. When  when controlled by a clock » SDR vs. DDR  what controlled by FSM or uCode control point values • Note  book ignores this for the most part » fine in a way • tends to consume a small amount of area and power • BUT tends to be the major problem – in terms of getting it right!! 12 CS6810 School of Computing University of Utah Example: FSM for a simple AddPage 7 13 CS6810 School of Computing University of Utah Full Control Scenario 14 CS6810 School of Computing University of Utah Pipeline Parallelism • Best case – execute 5 instructions at once  Note pipeline fill and flush overhead  in stead state » 5x frequency  ideal speedup • Problem  consider single I & D memory » step 4 & 5 have a resource conflictPage 8 15 CS6810 School of Computing University of Utah Pipeline Characteristics • Latency  time it takes for an instruction to complete » worse w/ pipeline since latch delay added to critical path » dominant feature if lots of exceptions • steady state doesn’t last for long • branch miss_predicts, cache misses, real exceptions • Throughput  dominant feature if steady state is common » compiler tries hard to make this true  e.g. no » cache misses » register misses » speculation failures » real exceptions 16 CS6810 School of Computing University of Utah Example • Unpipelined  5 steps: 50, 50, 60, 50, 50 ns respectively  total 260 ns • Turn it into a pipelined design  10 ns of “laminarity” penalty  5 ns delay due to latches » set-up, hold, and fall through delays • Hence  must run at slowest stage rate/clock = 65 ns  speedup 260/64 = 4x » rather than idealized 5xPage 9 17 CS6810 School of Computing University of Utah Pipeline Hair • Laminarity is hard  depends a lot on F04 budget » 20+ FO4 is somewhat easy » 13- has proven to be problematic • Extra resources  each stage needs it’s own » design drill • list all possible instruction resource needs • separate by stage • each stage needs it’s private set • Example  PC modification can’t use same ALU as arithmetic


View Full Document

U of U CS 6810 - Pipelines

Documents in this Course
Caches

Caches

13 pages

Load more
Download Pipelines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?