DOC PREVIEW
U of U CS 7810 - Core Design

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Lecture 19: Core Design• Today: issue queue, ILP, clock speed, ILP innovationsWakeup LogicrdyL rdyRtagRtagLor= =ortag1 tagIW…rdyL rdyRtagRtagL......Selection LogicIssue windowreq grantanyreqenableenableArbiter cell• For multiple FUs, will need sequential selectors4Structure Complexities• Critical structures: register map tables, issue queue, LSQ, register file,register bypass• Cycle time is heavily influenced by:window size (physical register size), issue width (#FUs)• Conflict between the desire to increase IPC and clock speed• Can achieve both if we use large structures and deeppipelining; but, some structures can’t be easily pipelined andlong-latency structures can also hurt IPC5Deep Pipelines• What does it mean to have 2-cycle wakeup 2-cycle bypass 2-cycle regreadScaling Options20-IQ40RegsFFFF20-IQ40RegsFFFF2-cycle wakeup2-cycle regread2-cycle bypass15-IQ30RegsFFF15-IQ30RegsFFF15-IQ30RegsFFFPipeline ScalingCapacity ScalingReplicated CapacityScaling7Recent Trends• Not much change in structure capacities• Not much change in cycle time• Pipeline depths have become shorter (circuit delays havereduced); this is good for energy efficiency• Optimal performance is observed at about 50 pipelinestages (we are currently at ~20 stages for energy reasons)• Deep pipelines improve parallelism (helps if there’s ILP);Deep pipelines increase the gap between dependentinstructions (hurts when there is little ILP)ILP Limits Wall 19939Techniques for High ILP• Better branch prediction and fetch (trace cache) cascading branch predictors?• More physical registers, ROB, issue queue, LSQ two-level regfile/IQ?• Higher issue width clustering?• Lower average cache hierarchy access time• Memory dependence prediction• Latency tolerance techniques: ILP, MLP, prefetch, runahead,multi-threadingImpact of Mem-Dep Prediction• In the perfect model, loads only wait for conflictingstores; in naïve model, loads issue speculatively and mustbe squashed if a dependence is later discoveredFrom Chrysos and Emer, ISCA’98ClusteringReg-rename &Instr steerIQRegfileF FIQRegfileF Fr1  r2 + r3r4  r1 + r2r5  r6 + r7r8  r1 + r5p21  p2 + p3p22  p21 + p2p42  p21p41  p56 + p57p43  p42 + p4140 regs in each clusterr1 is mapped to p21 and p42 – will influence steering and instr commit – on average, only 8 replicated regs2Bc-gskew Branch PredictorAddressAddress+HistoryBIMMetaG1G0PredVote44 KB; 2-cycle access; used in the Alpha 21464Rules• On a correct prediction if all agree, no update if they disagree, strengthen correct preds andchooser• On a misprediction update chooser and recompute the prediction on a correct prediction, strengthen correctpreds on a misprediction, update all predsRunahead Mutlu et al., HPCA’03TraceCacheCurrentRenameIssueQRegfile (128)CheckpointedRegfile (32)RetiredRenameROBFUsL1 DRunaheadCacheWhen the oldest instruction is a cache miss, behave like itcauses a context-switch: • checkpoint the committed registers, rename table, returnaddress stack, and branch history register• assume a bogus value and start a new thread• this thread cannot modify program state, but can prefetchMemory Bottlenecks• 128-entry window, real L2  0.77 IPC• 128-entry window, perfect L2  1.69• 2048-entry window, real L2  1.15• 2048-entry window, perfect L2  2.02• 128-entry window, real L2, runahead  0.9416Title•


View Full Document

U of U CS 7810 - Core Design

Download Core Design
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Core Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Core Design 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?