DOC PREVIEW
U of U CS 6810 - Dynamic Issue & HW Speculation

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Raising the IPC Ceiling w single issue IPCmax 1 Dynamic Issue HW Speculation schedule as hard as you want and it s still the asymptote keeping things in order lots of stalls XU s finish out of order anyway Today s topics when the transistor budget is high enough Superscalar pipelines just go with multiple issue 4 issue common today superscalar machines Dynamic Issue Superscalar issues issuewidth n Scoreboarding control centric approach need n way capability in all pipeline stages fetch n no worries fetch cache line of instructions cycle decode n Tomasulo data centric approach get register values problems execute n problems mem n problems w out of order completion WB n problems w out of order completion School of Computing University of Utah 1 School of Computing University of Utah CS6810 Fix OOO Completion Problem First 2 CS6810 Reorder Buffer In Action Enter the ROB re order buffer basic idea for now issue instructions in order retire commit instructions in order use an intermediate buffer to hold results since destructive action to register file or memory must happen in order Other ROB niceties helps w speculation nullification exceptions but first a simple example See any problems School of Computing University of Utah 3 School of Computing University of Utah CS6810 Page 1 4 CS6810 Several Issues Several Issues WB stage is now the commit stage WB stage is now the commit stage ROB values move to the register file ROB values move to the register file whoops if tags are in the issue queue whoops if tags are in the issue queue those values need to be renamed to the register name seems complex can you thing of a better way those values need to be renamed to the register name seems complex can you think of a better way IQ contains both register and tag fields w 1 bit to select which is valid initially tag is selected when tag is retired broadcast to IQ and invert selector on a match what about tag values in the pipe only need to worry about entry into EX stage compares needed there as well ROB is WB stage so that s not a problem MEM isn t a problem either WHY Key observation all destructive operations are done by the ROB commit retire School of Computing University of Utah 5 School of Computing University of Utah CS6810 Nullification Exceptions CS6810 Decode Complexity If an exception happens ROB complicates ID significantly exception type is written to the ROB field operand fetch now has two sources note that one instruction could generate an exception in multiple stages register file or ROB field hence an additional mux is required only care about the first one so no overwrite is allowed rename takes some time If some instruction is speculative structural issue requirements will help mitigate the performance penalty then predicate is written to the ROB field note predicate covers branch delay slots and effectively supports nullification Bottom line ID will no longer be a single cycle stage WB stage in reality For register poor ISA s like x86 try to retire n instructions per cycle ROB slots effectively provides a renamed register pool if none have pending predicates or exceptions then retire in order retire 1st member of n instruction bundle w problem actually it s not the right choice Why remember the front end back end x86 thing retire the instructions before nullify whatever is next in the bundle take the exception and hold the rest School of Computing University of Utah 6 7 School of Computing University of Utah CS6810 Page 2 8 CS6810 ROB Hazard Removal EX Stages XU s RAW Typical separation of XU s nothing changes here no way you can use a value before it s computed unless the value is predicted and predicated only some academic papers think this is a reasonable idea hence instruction scheduling is required ALU int shift logical AND OR XOR NOT int multiply int divide FP ops can be 32 or 64 bit typically implement 64 bit Wax ROB renaming effectively removes this problem as long as enough ROB slots exist if not Overlaps then the instruction can t be issued and a NOP is injected in the pipe Branch and Mem ops can be handled with an ALU int mul or div can be handled by the FP equivalent Note a common choice is to have a int mul but not an int div stalling pipelines GHz frequencies is a problem why hence NOPs are dynamically generated and pushed through the pipe any issues here School of Computing University of Utah 9 actual choice influences structural issue rules School of Computing University of Utah CS6810 Structural Issue Rules 2 1 1 1 1 CS6810 Clearly vary by machine Example for a 6 issue machine ALU Branch Int Mul or Divide FP Add or Sub Mem Why does this make sense 2 1 1 1 1 ALU or 1 ALU and 1 Int Mul Branch FP Mul or Divide FP Add or Sub Mem Why does this make sense e g justification Look at instruction frequency and common effort School of Computing University of Utah 10 Structural Issue Rules Clearly vary by machine Example for a 6 issue machine FP add sub FP multiply FP divide or FP invert 1 x FP sqrt or FP isqrt 11 Branch average about every 6 instructions so need that LD ST about every 6 as well seldom need FP Mul Divide on same cycle FP Add Sub share exponent normalization Int Divide is done on the FP Div unit School of Computing University of Utah CS6810 Page 3 12 CS6810 Dynamic Issue Dynamic Issue Context Until Now Less viable in multi core land instructions have been issued in order single thread performance is not longer the Holy Grail power wall is the fundamental constraint compiler thinks the world is sequential HW must fulfill that contract dynamic issue consumes a lot of power all the OOO ROB stuff consumes a lot of power e g Issue Queue thermal wall is also an issue Dynamic Issue frequency derating is common affects reliability cost in a major way basics use instruction window buffer rather than a Q choose the n instructions to issue with billion transistor chips if they re all active then the chip melts interesting stat in a recent talk such that dependencies are satisfied and structural rules are not violated 2 methods C0 state is in play a very small percentage of the time Hence control centric Scoreboarding data centric Tomasulo text focus I previously spent a lot of time on this issue this term we ll look at the conceptual side and skip the minutiae School of Computing University of Utah 13 School of Computing University of Utah CS6810 Trends 14 CS6810 Core Comparison source presentation by John Shalf NERSC School of Computing University of Utah 15


View Full Document

U of U CS 6810 - Dynamic Issue & HW Speculation

Documents in this Course
Caches

Caches

13 pages

Pipelines

Pipelines

14 pages

Load more
Download Dynamic Issue & HW Speculation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Dynamic Issue & HW Speculation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dynamic Issue & HW Speculation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?