COMP 206: Computer Architecture and ImplementationInstruction-Level ParallelismHardware Schemes for ILPDynamic SchedulingExplanation of IOut-of-order Execution and RenamingMemory ConsistencyFour Possibilities for Load/Store MotionMore on Load Bypassing and ForwardingLoad Bypassing in HardwareExample of Load BypassingHistory of Dynamic Scheduling1COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementationMontek SinghMontek SinghMon, Oct 3, 2005Mon, Oct 3, 2005Topic: Topic: Instruction-Level ParallelismInstruction-Level Parallelism(Dynamic Scheduling: Introduction)(Dynamic Scheduling: Introduction)2Instruction-Level ParallelismInstruction-Level ParallelismRelevant Book Reading (HP3): Relevant Book Reading (HP3): Dynamic Scheduling (in hardware): Appendix A & Dynamic Scheduling (in hardware): Appendix A & Chapter 3Chapter 3Compiler Scheduling (in software): Chapter 4Compiler Scheduling (in software): Chapter 43Hardware Schemes for ILPHardware Schemes for ILPWhy do it in hardware at run time?Why do it in hardware at run time?Works when can’t know dependences at compile timeWorks when can’t know dependences at compile timeSimpler compilerSimpler compilerCode for one machine runs well on another machineCode for one machine runs well on another machineKey idea: Allow instructions behind stall to Key idea: Allow instructions behind stall to proceedproceedDIV.DDIV.DF0, F2, F4F0, F2, F4ADD.D ADD.D F10, F0, F8F10, F0, F8SUB.D SUB.D F8, F8, F14F8, F8, F14Enables out-of-order executionEnables out-of-order executionImplies out-of-order completionImplies out-of-order completionID stage check for both structural and data ID stage check for both structural and data dependencesdependences4Dynamic SchedulingDynamic SchedulingDIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F12, F8, F14DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F12, F8, F14•7-cycle divider•4-cycle adder1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20In-order DIV.D F0, F2, F4 F D I E E E E E E E M WADD.D F10, F0, F8 F D I I I I I I I E E E E M WSUB.D F12, F8, F14 F D I I I I I I I I I I E E E E M WOut-of-order DIV.D F0, F2, F4 F D I E E E E E E E M WADD.D F10, F0, F8 F D I I I I I I I E E E E M WSUB.D F12, F8, F14 F D I E E E E M WInstructions are issued in order (leftmost Instructions are issued in order (leftmost II))Execution can begin out of order (leftmost Execution can begin out of order (leftmost EE))Execution can terminate out of order (Execution can terminate out of order (WW))What is What is I?I?5Explanation of Explanation of IITo be able to execute the SUB.D instructionTo be able to execute the SUB.D instructionA function unit must be availableA function unit must be availableAdder is free in exampleAdder is free in exampleThere should be no data hazards preventing early There should be no data hazards preventing early executionexecutionNone in this exampleNone in this exampleWe must be able to recognize the two previous conditionsWe must be able to recognize the two previous conditionsMust examine several instructions before deciding on what to Must examine several instructions before deciding on what to executeexecuteI represents the I represents the instruction windowinstruction window (or (or issue issue windowwindow) in which this examination happens) in which this examination happensIf every instruction starts execution in order, then If every instruction starts execution in order, then II is is superfluoussuperfluousOtherwise:Otherwise:Instruction enter the issue window in orderInstruction enter the issue window in orderSeveral instructions may be in issue window at any instantSeveral instructions may be in issue window at any instantExecution can begin out of orderExecution can begin out of order6Out-of-order Execution and Out-of-order Execution and RenamingRenamingWAW hazard on register F10: WAW hazard on register F10: prevents out-of-prevents out-of-order execution on machine like CDC 6600order execution on machine like CDC 6600If processor was capable of If processor was capable of register renaming:register renaming:the WAW hazard would be eliminatedthe WAW hazard would be eliminatedSUB.D could execute early as beforeSUB.D could execute early as beforeexample: IBM 360/91example: IBM 360/91DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F10, F8, F14DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F10, F8, F147Memory ConsistencyMemory ConsistencyMemory consistency refers to the order of main Memory consistency refers to the order of main memory accesses as compared to the order seen memory accesses as compared to the order seen in sequential (unpipelined) executionin sequential (unpipelined) executionStrong memory consistency: Strong memory consistency: All memory accesses are All memory accesses are made in strict program ordermade in strict program orderWeak memory consistency: Weak memory consistency: Memory accesses may be Memory accesses may be made out of order, provided that no dependences are made out of order, provided that no dependences are violatedviolatedWeak memory consistency is more desirableWeak memory consistency is more desirableleads to increased performanceleads to increased performanceIn what follows, ignore register hazardsIn what follows, ignore register hazardsQ:Q: When can two memory accesses be re- When can two memory accesses be re-ordered?ordered?8Four Possibilities for Load/Store Four Possibilities for Load/Store MotionMotionLoad-LoadLW R1, (R2)LW R3, (R4)Load-LoadLW R1, (R2)LW R3, (R4)Load-Load can always be Load-Load can always be interchanged (if no interchanged (if no volatilevolatiles)s)Load-Store and Store-Store are Load-Store and Store-Store are never interchangednever interchangedStore-Load is the only promising Store-Load is the only promising program transformationprogram transformationLoad is done earlier than Load is done earlier than planned, which can only helpplanned, which can only helpStore is done later than Store is done later than planned, which should cause no planned, which should cause no harmharmTwo variants of transformationTwo variants of transformationIf load is independent of store, If load is independent of store, we have load bypassingwe have load bypassingIf load is dependent on store If load is dependent on store through memory (e.g., (R1) == through memory (e.g., (R1) ==
View Full Document