DOC PREVIEW
U of U CS 7810 - Core Design

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Lecture 18: Core Design• Today: basics of implementing a correct ooo core:register renaming, commit, LSQ, issue queue2The Alpha 21264 Out-of-Order ImplementationBranch predictionand instr fetchR1  R1+R2R2  R1+R3BEQZ R2R3  R1+R2R1  R3+R2Instr Fetch QueueDecode &RenameInstr 1Instr 2Instr 3Instr 4Instr 5Instr 6Reorder Buffer (ROB)P33  P1+P2P34  P33+P3BEQZ P34P35  P33+P34P36  P35+P34Issue Queue (IQ)ALU ALU ALURegister FileP1-P64Results written toregfile and tagsbroadcast to IQSpeculativeReg MapR1P36R2P34CommittedReg MapR1P1R2P2RenameA lr1  lr2 + lr3B lr2  lr4 + lr5C lr6  lr1 + lr3D lr6  lr1 + lr2RAR lr3RAW lr1WAR lr2WAW lr6A ; BC ; Dpr7  pr2 + pr3pr8  pr4 + pr5pr9  pr7 + pr3pr10  pr7 + pr8RAR pr3RAW pr7WAR xWAW xAB ; CDCommit ExampleA lr1  lr2 + lr3B lr2  lr4 + lr5C lr6  lr1 + lr3D lr6  lr1 + lr2E lr3  lr6 + lr2F lr4  lr3 + lr4pr7  pr2 + pr3pr8  pr4 + pr5pr9  pr7 + pr3pr10  pr7 + pr8pr1  pr10 + pr8pr2  pr1 + pr4Assume a processor with 6 logical regs and 10 physical regsMap Old / Newlr1 pr1 pr7lr2 pr2 pr8lr6 pr6 pr9lr6 pr9 pr10lr3 pr3 pr1lr4 pr4 pr25Out-of-Order Loads/StoresLd R1  [R2]LdStLdLdR3  [R4]R5  [R6]R7  [R8]R9[R10]6Memory Dependence CheckingLd 0x abcdefLdStLdLd 0x abcdefSt 0x abcd00Ld 0x abc000Ld 0x abcd00• The issue queue checks forregister dependences and executes instructions as soonas registers are ready• Loads/stores access memoryas well – must check for RAW,WAW, and WAR hazards formemory as well• Hence, first check for registerdependences to computeeffective addresses; then checkfor memory dependences7Memory Dependence CheckingLd 0x abcdefLdStLdLd 0x abcdefSt 0x abcd00Ld 0x abc000Ld 0x abcd00• Load and store addresses aremaintained in program order inthe Load/Store Queue (LSQ)• Loads can issue if they areguaranteed to not have truedependences with earlier stores• Stores can issue only if we areready to modify memory (can notrecover if an earlier instr raisesan exception)8The Alpha 21264 Out-of-Order ImplementationBranch predictionand instr fetchR1  R1+R2R2  R1+R3BEQZ R2R3  R1+R2R1  R3+R2LD R4  8[R3]ST R4  8[R1]Instr Fetch QueueDecode &RenameInstr 1Instr 2Instr 3Instr 4Instr 5Instr 6Instr 7Reorder Buffer (ROB)P33  P1+P2P34  P33+P3BEQZ P34P35  P33+P34P36  P35+P34P37  8[P35]P37  8[P36]Issue Queue (IQ)ALU ALU ALURegister FileP1-P64Results written toregfile and tagsbroadcast to IQP37  [P35 + 8]P37  [P36 + 8]LSQALUD-CacheCommittedReg MapR1P1R2P2SpeculativeReg MapR1P36R2P349Speculative Issue• Instr I1 leaves the issue queue at start of cycle 6; the instrthen reads operands from the regfile, wires are traversed,instruction executes, result is available at end of cycle 8• If operand availability is broadcast to issue queue in cycle 9,dependent instruction leaves in cycle 10• This causes a 4-cycle gap between successive instrs• Hence, if we know that the instruction takes a cycle toexecute, the operand is broadcast to the issue queue incycle 6 and the dependent instr leaves issue queue incycle 7; the input operand is correctly bypassed at the FU10Load Hit Speculation• The previous optimization assumes that we know the exactlatency for every operation• This is true for all ops except loads (cache hit or miss?)• Assume hit and schedule accordingly; on a cache miss,must squash all speculatively issued instructions; aninstruction therefore sits in the queue until load hits aredeterminedRegister Rename LogicMapTableDependenceCheckLogicMuxLogicalSourceRegsLogicalDestRegsLogicalSource RegPhysicalSourceRegsPhysicalDestRegsFree PoolMap Table – RAMPhys reg idNum entries =Num logical regsShadow copies (shift register)7-bits 7-bits 7-bits 7-bits 7-bitsMap Table – CAMLogical reg idNum entries =Num phys regsShadow copies5-bits 1-bitvalid1-bitWakeup LogicrdyL rdyRtagRtagLor= =ortag1 tagIW…rdyL rdyRtagRtagL......Selection LogicIssue windowreq grantanyreqenableenableArbiter cell• For multiple FUs, will need sequential selectors16Structure Complexities• Critical structures: register map tables, issue queue, LSQ, register file,register bypass• Cycle time is heavily influenced by:window size (physical register size), issue width (#FUs)• Conflict between the desire to increase IPC and clock speedILP Limits Wall 199318Title•


View Full Document

U of U CS 7810 - Core Design

Download Core Design
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Core Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Core Design 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?