U of U CS 6810 - Dynamic Issue and HW Speculation

Unformatted text preview:

Page 1 1 CS6810 School of Computing University of Utah Dynamic Issue & HW Speculation Today’s topics: Superscalar pipelines Dynamic Issue Scoreboarding: control centric approach Tomasulo: data centric approach 2 CS6810 School of Computing University of Utah Raising the IPC Ceiling • w/ single-issue IPCmax = 1  schedule as hard as you want and it’s still the asymptote » keeping things in order  lots of stalls • XU’s finish out of order anyway » when the transistor budget is high enough • just go with multiple issue – >= 4 issue common today ::= superscalar machines • Superscalar issues: issuewidth = n  need n way capability in all pipeline stages » fetch n – no worries fetch cache line of instructions/cycle » decode n • get register values – problems? » execute n • problems? » mem n • problems? w/out of order completion? » WB n • problems w/ out of order completion?Page 2 3 CS6810 School of Computing University of Utah Fix OOO Completion Problem First • Enter the ROB (re-order buffer)  basic idea for now » issue instructions in-order » retire/commit instructions in order » use an intermediate buffer to hold results • since destructive action to register file or memory must happen in order • Other ROB niceties  helps w/ » speculation » nullification » exceptions  but first a simple example 4 CS6810 School of Computing University of Utah Reorder Buffer In Action See any problems?Page 3 5 CS6810 School of Computing University of Utah Several Issues • WB stage is now the commit stage  ROB values move to the register file » whoops if tags are in the issue queue • those values need to be renamed to the register name • seems complex – can you thing of a better way? 6 CS6810 School of Computing University of Utah Several Issues • WB stage is now the commit stage  ROB values move to the register file » whoops if tags are in the issue queue • those values need to be renamed to the register name • seems complex – can you think of a better way?  IQ contains both register and tag fields » w/ 1 bit to select which is valid • initially tag is selected • when tag is retired – broadcast to IQ and invert selector on a match • what about tag values in the pipe – only need to worry about entry into EX stage – compares needed there as well – ROB is WB stage so that’s not a problem – MEM isn’t a problem either WHY? • Key observation  all destructive operations are done by the ROB commit/retirePage 4 7 CS6810 School of Computing University of Utah Nullification & Exceptions • If an exception happens  exception type is written to the ROB field » note that one instruction could generate an exception in multiple stages • only care about the first one so no overwrite is allowed • If some instruction is speculative  then predicate is written to the ROB field  note: predicate covers branch delay slots and effectively supports nullification • WB stage in reality  try to retire n instructions per cycle » if none have pending predicates or exceptions then retire » in order retire  1st member of n-instruction bundle w/ problem • retire the instructions before • nullify whatever is next in the bundle • take the exception and hold the rest 8 CS6810 School of Computing University of Utah Decode Complexity • ROB complicates ID significantly  operand fetch now has two sources » register file or ROB field • hence an additional mux is required  rename takes some time » structural issue requirements will help mitigate the performance penalty • Bottom line  ID will no longer be a single cycle stage • For register poor ISA’s like x86  ROB slots effectively provides a renamed register pool » actually it’s not the right choice • Why? • remember the front-end back-end x86 thingPage 5 9 CS6810 School of Computing University of Utah ROB Hazard Removal • RAW  nothing changes here » no way you can use a value before it’s computed » unless the value is predicted and predicated • only some academic papers think this is a reasonable idea » hence instruction scheduling is required • Wax  ROB renaming effectively removes this problem » as long as enough ROB slots exist » if not • then the instruction can’t be issued and a NOP is injected in the pipe • Note  stalling pipelines @ GHz frequencies is a problem » hence NOPs are dynamically generated and pushed through the pipe » any issues here? 10 CS6810 School of Computing University of Utah EX Stages XU’s • Typical separation of XU’s  ALU (int +/-, shift, logical (AND, OR, XOR, NOT)  int-multiply  int-divide  FP ops can be 32 or 64-bit (typically implement 64-bit) » FP-add-sub » FP-multiply » FP-divide or FP-invert (1/x) » FP-sqrt or FP-isqrt? • Overlaps  Branch and Mem ops can be handled with an ALU  int mul or div can be handled by the FP equivalent » a common choice is to have a int-mul but not an int-div • why?  actual choice influences structural issue rulesPage 6 11 CS6810 School of Computing University of Utah Structural Issue Rules • Clearly vary by machine • Example for a 6 issue machine  2 ALU  1 Branch  1 Int Mul or Divide  1 FP Add or Sub  1 Mem • Why does this make sense?  e.g. justification 12 CS6810 School of Computing University of Utah Structural Issue Rules • Clearly vary by machine • Example for a 6 issue machine  2 ALU or 1 ALU and 1 Int-Mul  1 Branch  1 FP Mul or Divide  1 FP Add or Sub  1 Mem • Why does this make sense?  Look at instruction frequency and common effort » Branch average about every 6 instructions so need that » LD + ST about every 6 as well » seldom need FP Mul & Divide on same cycle » FP Add/Sub share exponent normalization » Int-Divide is done on the FP-Div unitPage 7 13 CS6810 School of Computing University of Utah Dynamic Issue • Until Now  instructions have been issued in order » compiler thinks the world is sequential » HW must


View Full Document

U of U CS 6810 - Dynamic Issue and HW Speculation

Documents in this Course
Caches

Caches

13 pages

Pipelines

Pipelines

14 pages

Load more
Download Dynamic Issue and HW Speculation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Dynamic Issue and HW Speculation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dynamic Issue and HW Speculation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?