DOC PREVIEW
U of U CS 6810 - ILP Innovations

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

PowerPoint PresentationSlide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 141Lecture 9: ILP Innovations• Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Sections 3.9-3.10, detailed notes)•Turn in HW3•HW4 will be posted by tomorrow, due in a week2The Alpha 21264 Out-of-Order ImplementationBranch predictionand instr fetchR1  R1+R2R2  R1+R3BEQZ R2R3  R1+R2R1  R3+R2Instr Fetch QueueDecode &RenameInstr 1Instr 2Instr 3Instr 4Instr 5Instr 6Reorder Buffer (ROB)P33  P1+P2P34  P33+P3BEQZ P34P35  P33+P34P36  P35+P34Issue Queue (IQ)ALU ALU ALURegister FileP1-P64Results written toregfile and tagsbroadcast to IQSpeculativeReg MapR1P36R2P34CommittedReg MapR1P1R2P23Out-of-Order Loads/StoresLd R1  [R2]LdStLdLdWhat if the issue queue also had load/store instructions? Can we continue executing instructions out-of-order?R3  [R4]R5  [R6]R7  [R8]R9[R10]4Memory Dependence CheckingLd 0x abcdefLdStLdLd 0x abcdefSt 0x abcd00Ld 0x abc000Ld 0x abcd00• The issue queue checks for register dependences and executes instructions as soon as registers are ready• Loads/stores access memory as well – must check for RAW, WAW, and WAR hazards for memory as well• Hence, first check for register dependences to compute effective addresses; then check for memory dependences5Memory Dependence CheckingLd 0x abcdefLdStLdLd 0x abcdefSt 0x abcd00Ld 0x abc000Ld 0x abcd00• Load and store addresses are maintained in program order in the Load/Store Queue (LSQ)• Loads can issue if they are guaranteed to not have true dependences with earlier stores• Stores can issue only if we are ready to modify memory (can not recover if an earlier instr raises an exception)6The Alpha 21264 Out-of-Order ImplementationBranch predictionand instr fetchR1  R1+R2R2  R1+R3BEQZ R2R3  R1+R2R1  R3+R2LD R4  8[R3]ST R4  8[R1]Instr Fetch QueueDecode &RenameInstr 1Instr 2Instr 3Instr 4Instr 5Instr 6Instr 7Reorder Buffer (ROB)P33  P1+P2P34  P33+P3BEQZ P34P35  P33+P34P36  P35+P34P37  8[P35]P37  8[P36]Issue Queue (IQ)ALU ALU ALURegister FileP1-P64Results written toregfile and tagsbroadcast to IQP37  [P35 + 8]P37  [P36 + 8]LSQALUD-CacheCommittedReg MapR1P1R2P2SpeculativeReg MapR1P36R2P347Improving Performance• Techniques to increase performance: pipelining improves clock speed increases number of in-flight instructions hazard/stall elimination branch prediction register renaming efficient caching out-of-order execution with large windows memory disambiguation bypassing increased pipeline bandwidth8Deep Pipelining• Increases the number of in-flight instructions• Decreases the gap between successive independent instructions• Increases the gap between dependent instructions• Depending on the ILP in a program, there is an optimal pipeline depth• Tough to pipeline some structures; increases the cost of bypassing9Increasing Width• Difficult to find more than four independent instructions• Difficult to fetch more than six instructions (else, must predict multiple branches)• Increases the number of ports per structure10Reducing Stalls in Fetch• Better branch prediction novel ways to index/update and avoid aliasing cascading branch predictors• Trace cache stores instructions in the common order of execution, not in sequential order in Intel processors, the trace cache stores pre-decoded instructions11Reducing Stalls in Rename/Regfile• Larger ROB/register file/issue queue• Virtual physical registers: assign virtual register names to instructions, but assign a physical register only when the value is made available• Runahead: while a long instruction waits, let a thread run ahead to prefetch (this thread can deallocate resources more aggressively than a processor supporting precise execution)• Two-level register files: values being kept around in the register file for precise exceptions can be moved to 2nd level12Stalls in Issue Queue• Two-level issue queues: 2nd level contains instructions that are less likely to be woken up in the near future• Value prediction: tries to circumvent RAW hazards• Memory dependence prediction: allows a load to execute even if there are prior stores with unresolved addresses• Load hit prediction: instructions are scheduled early, assuming that the load will hit in cache13Functional Units• Clustering: allows quick bypass among a small group of functional units; FUs can also be associated with a subset of the register file and issue queue14Title•


View Full Document

U of U CS 6810 - ILP Innovations

Documents in this Course
Caches

Caches

13 pages

Pipelines

Pipelines

14 pages

Load more
Download ILP Innovations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ILP Innovations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ILP Innovations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?