U of U CS 7810 - Core Design - D682992

Home> Schools> University of Utah> Computer Science (CS) > CS 7810> Core Design

DOC PREVIEW

U of U CS 7810 - Core Design

School name University of Utah

Course Cs 7810- Advanced Computer Architecture

Pages 16

This preview shows page 1-2-3-4-5 out of 16 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Lecture 19: Core Design• Today: issue queue, ILP, clock speed, ILP innovationsWakeup LogicrdyL rdyRtagRtagLor= =ortag1 tagIW…rdyL rdyRtagRtagL......Selection LogicIssue windowreq grantanyreqenableenableArbiter cell• For multiple FUs, will need sequential selectors4Structure Complexities• Critical structures: register map tables, issue queue, LSQ, register file,register bypass• Cycle time is heavily influenced by:window size (physical register size), issue width (#FUs)• Conflict between the desire to increase IPC and clock speed• Can achieve both if we use large structures and deeppipelining; but, some structures can’t be easily pipelined andlong-latency structures can also hurt IPC5Deep Pipelines• What does it mean to have 2-cycle wakeup 2-cycle bypass 2-cycle regreadScaling Options20-IQ40RegsFFFF20-IQ40RegsFFFF2-cycle wakeup2-cycle regread2-cycle bypass15-IQ30RegsFFF15-IQ30RegsFFF15-IQ30RegsFFFPipeline ScalingCapacity ScalingReplicated CapacityScaling7Recent Trends• Not much change in structure capacities• Not much change in cycle time• Pipeline depths have become shorter (circuit delays havereduced); this is good for energy efficiency• Optimal performance is observed at about 50 pipelinestages (we are currently at ~20 stages for energy reasons)• Deep pipelines improve parallelism (helps if there’s ILP);Deep pipelines increase the gap between dependentinstructions (hurts when there is little ILP)ILP Limits Wall 19939Techniques for High ILP• Better branch prediction and fetch (trace cache) cascading branch predictors?• More physical registers, ROB, issue queue, LSQ two-level regfile/IQ?• Higher issue width clustering?• Lower average cache hierarchy access time• Memory dependence prediction• Latency tolerance techniques: ILP, MLP, prefetch, runahead,multi-threadingImpact of Mem-Dep Prediction• In the perfect model, loads only wait for conflictingstores; in naïve model, loads issue speculatively and mustbe squashed if a dependence is later discoveredFrom Chrysos and Emer, ISCA’98ClusteringReg-rename &Instr steerIQRegfileF FIQRegfileF Fr1  r2 + r3r4  r1 + r2r5  r6 + r7r8  r1 + r5p21  p2 + p3p22  p21 + p2p42  p21p41  p56 + p57p43  p42 + p4140 regs in each clusterr1 is mapped to p21 and p42 – will influence steering and instr commit – on average, only 8 replicated regs2Bc-gskew Branch PredictorAddressAddress+HistoryBIMMetaG1G0PredVote44 KB; 2-cycle access; used in the Alpha 21464Rules• On a correct prediction if all agree, no update if they disagree, strengthen correct preds andchooser• On a misprediction update chooser and recompute the prediction on a correct prediction, strengthen correctpreds on a misprediction, update all predsRunahead Mutlu et al., HPCA’03TraceCacheCurrentRenameIssueQRegfile (128)CheckpointedRegfile (32)RetiredRenameROBFUsL1 DRunaheadCacheWhen the oldest instruction is a cache miss, behave like itcauses a context-switch: • checkpoint the committed registers, rename table, returnaddress stack, and branch history register• assume a bogus value and start a new thread• this thread cannot modify program state, but can prefetchMemory Bottlenecks• 128-entry window, real L2  0.77 IPC• 128-entry window, perfect L2  1.69• 2048-entry window, real L2  1.15• 2048-entry window, perfect L2  2.02• 128-entry window, real L2, runahead  0.9416Title•

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 16 pages.

U of U CS 7810 - Core Design

Sign up for free to view:

Please select your school