Unformatted text preview:

COMP 206: Computer Architecture and ImplementationInstruction-Level ParallelismHardware Schemes for ILPDynamic SchedulingExplanation of IOut-of-order Execution and RenamingMemory ConsistencyFour Possibilities for Load/Store MotionMore on Load Bypassing and ForwardingLoad Bypassing in HardwareExample of Load BypassingHistory of Dynamic Scheduling1COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementationMontek SinghMontek SinghMon, Oct 3, 2005Mon, Oct 3, 2005Topic: Topic: Instruction-Level ParallelismInstruction-Level Parallelism(Dynamic Scheduling: Introduction)(Dynamic Scheduling: Introduction)2Instruction-Level ParallelismInstruction-Level ParallelismRelevant Book Reading (HP3): Relevant Book Reading (HP3): Dynamic Scheduling (in hardware): Appendix A & Dynamic Scheduling (in hardware): Appendix A & Chapter 3Chapter 3Compiler Scheduling (in software): Chapter 4Compiler Scheduling (in software): Chapter 43Hardware Schemes for ILPHardware Schemes for ILPWhy do it in hardware at run time?Why do it in hardware at run time?Works when can’t know dependences at compile timeWorks when can’t know dependences at compile timeSimpler compilerSimpler compilerCode for one machine runs well on another machineCode for one machine runs well on another machineKey idea: Allow instructions behind stall to Key idea: Allow instructions behind stall to proceedproceedDIV.DDIV.DF0, F2, F4F0, F2, F4ADD.D ADD.D F10, F0, F8F10, F0, F8SUB.D SUB.D F8, F8, F14F8, F8, F14Enables out-of-order executionEnables out-of-order executionImplies out-of-order completionImplies out-of-order completionID stage check for both structural and data ID stage check for both structural and data dependencesdependences4Dynamic SchedulingDynamic SchedulingDIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F12, F8, F14DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F12, F8, F14•7-cycle divider•4-cycle adder1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20In-order DIV.D F0, F2, F4 F D I E E E E E E E M WADD.D F10, F0, F8 F D I I I I I I I E E E E M WSUB.D F12, F8, F14 F D I I I I I I I I I I E E E E M WOut-of-order DIV.D F0, F2, F4 F D I E E E E E E E M WADD.D F10, F0, F8 F D I I I I I I I E E E E M WSUB.D F12, F8, F14 F D I E E E E M WInstructions are issued in order (leftmost Instructions are issued in order (leftmost II))Execution can begin out of order (leftmost Execution can begin out of order (leftmost EE))Execution can terminate out of order (Execution can terminate out of order (WW))What is What is I?I?5Explanation of Explanation of IITo be able to execute the SUB.D instructionTo be able to execute the SUB.D instructionA function unit must be availableA function unit must be availableAdder is free in exampleAdder is free in exampleThere should be no data hazards preventing early There should be no data hazards preventing early executionexecutionNone in this exampleNone in this exampleWe must be able to recognize the two previous conditionsWe must be able to recognize the two previous conditionsMust examine several instructions before deciding on what to Must examine several instructions before deciding on what to executeexecuteI represents the I represents the instruction windowinstruction window (or (or issue issue windowwindow) in which this examination happens) in which this examination happensIf every instruction starts execution in order, then If every instruction starts execution in order, then II is is superfluoussuperfluousOtherwise:Otherwise:Instruction enter the issue window in orderInstruction enter the issue window in orderSeveral instructions may be in issue window at any instantSeveral instructions may be in issue window at any instantExecution can begin out of orderExecution can begin out of order6Out-of-order Execution and Out-of-order Execution and RenamingRenamingWAW hazard on register F10: WAW hazard on register F10: prevents out-of-prevents out-of-order execution on machine like CDC 6600order execution on machine like CDC 6600If processor was capable of If processor was capable of register renaming:register renaming:the WAW hazard would be eliminatedthe WAW hazard would be eliminatedSUB.D could execute early as beforeSUB.D could execute early as beforeexample: IBM 360/91example: IBM 360/91DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F10, F8, F14DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F10, F8, F147Memory ConsistencyMemory ConsistencyMemory consistency refers to the order of main Memory consistency refers to the order of main memory accesses as compared to the order seen memory accesses as compared to the order seen in sequential (unpipelined) executionin sequential (unpipelined) executionStrong memory consistency: Strong memory consistency: All memory accesses are All memory accesses are made in strict program ordermade in strict program orderWeak memory consistency: Weak memory consistency: Memory accesses may be Memory accesses may be made out of order, provided that no dependences are made out of order, provided that no dependences are violatedviolatedWeak memory consistency is more desirableWeak memory consistency is more desirableleads to increased performanceleads to increased performanceIn what follows, ignore register hazardsIn what follows, ignore register hazardsQ:Q: When can two memory accesses be re- When can two memory accesses be re-ordered?ordered?8Four Possibilities for Load/Store Four Possibilities for Load/Store MotionMotionLoad-LoadLW R1, (R2)LW R3, (R4)Load-LoadLW R1, (R2)LW R3, (R4)Load-Load can always be Load-Load can always be interchanged (if no interchanged (if no volatilevolatiles)s)Load-Store and Store-Store are Load-Store and Store-Store are never interchangednever interchangedStore-Load is the only promising Store-Load is the only promising program transformationprogram transformationLoad is done earlier than Load is done earlier than planned, which can only helpplanned, which can only helpStore is done later than Store is done later than planned, which should cause no planned, which should cause no harmharmTwo variants of transformationTwo variants of transformationIf load is independent of store, If load is independent of store, we have load bypassingwe have load bypassingIf load is dependent on store If load is dependent on store through memory (e.g., (R1) == through memory (e.g., (R1) ==


View Full Document

UNC-Chapel Hill COMP 206 - Instruction-Level Parallelism

Download Instruction-Level Parallelism
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Instruction-Level Parallelism and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Instruction-Level Parallelism 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?