Unformatted text preview:

Advanced Compilers M. LamLecture 7Instruction SchedulingI Basic Block SchedulingII Global Scheduling (for Non-Numeric Code)Reading: Chapter 10.3 - 10.4Advanced Compilers 2 L7: Instruction SchedulingI. Scheduling Constraints• Data dependences• The operations must generate the same results as the correspond-ing ones in the original program.• Control dependences• All the operations executed in the original program must be executed in the optimized program• Resource constraints• No over-subscription of resources.Advanced Compilers 3 L7: Instruction SchedulingData Dependence• Must maintain order of accesses to potentially same locations• True dependence: write -> read (RAW hazard)a = ... = a• Output dependence: write -> write (WAW hazard)a = ...a = ...• Anti-dependence: read -> write (WAR hazard) = aa = • Data Dependence Graph• Nodes: operations• Edges: n1 -> n2 if n2 is data dependent on n1labeled by the execution length of n1Advanced Compilers 4 L7: Instruction SchedulingAnalysis on Memory Variables• Undecidable in generalread xread yA[x] = ...... = A[y] • Two memory accesses can potentially be the same unless proven otherwise• Classes of analysis• simple: base+offset1 = base+offset2?• “data dependence analysis”: Array accesses whose indices are affine expressions of loop indicesA[2i] = A[2i+1]?• interprocedural analysis: global = parameter?• pointer analysis: pointer1 = pointer2?• Data dependence analysis is useful for many other purposesAdvanced Compilers 5 L7: Instruction SchedulingResource Constraints• Each instruction type has a resource reservation table• Pipelined functional units: occupy only one slot• Non-pipelined functional units: multiple time slots• Instructions may use more than one resource• Multiple units of same resource• Limited instruction issue slots may also be managed like a resourceFunctional unitsTime012ld st alu fmpyfadd br...Advanced Compilers 6 L7: Instruction SchedulingExample of a Machine Model• Each instruction can execute 2 operations• 1 ALU operation or branch operationOp dst,src1,src2 executes in 1 clock• 1 load or store operationLD dst, addr result is available in 2 clockspipelined: can issue LD next clockST src, addr executes in 1 clock cycleAdvanced Compilers 7 L7: Instruction SchedulingBasic Block SchedulingLD R2,0(R1)ADD R3,R3,R2ST 4(R1),R2 ADD R3,R3,R4ST 0(R7),R7ST 12(R1),R32212memalui1i2i3i4i5i6i7LD R3,8(R1)11Advanced Compilers 8 L7: Instruction SchedulingWith Resource Constraints• NP-complete in general => Heuristics time!• List SchedulingREADY = nodes with 0 predecessorsLoop until READY is empty {Let n be the node in READY with highest prioritySchedule n in the earliest slot that satisfies precedence + resource constraintsUpdate predecessor count of n’s successor nodesUpdate READY}Advanced Compilers 9 L7: Instruction SchedulingList Scheduling• Scope: DAGs• Schedules operations in topological order• Never backtracks• Variations• Priority function for node n• delay: max delay slots from n to any node• critical path: max clocks from n to any node• resource requirements• source orderAdvanced Compilers 10 L7: Instruction SchedulingII. Introduction to Global Schedulingc = bif (a=0) goto LLD R6,0(R1)BEQZ R6,LLD R7,0(R2)ST 0(R3),R7 nopnope = d + dLD R8,0(R4)ST 0(R5),R8 nopL:B1B2B3L:ADD R8,R8,R8Advanced Compilers 11 L7: Instruction SchedulingResult of Code SchedulingLD R6,0(R1), LD R8,0(R4) ADD R8,R8,R8 BEQZ R6,LST 0(R5),R8, ST 0(R3),R7L:B1B3’B3ST 0(R5),R8LD R7,0(R2)Advanced Compilers 12 L7: Instruction SchedulingTerminologyControl equivalence• Two operations o1 and o2 are control equivalent if o1 is executed if and only if o2 is executed. Control dependence• An op o2 is control dependent on op o1if the execution of o2 depends on the outcome of o1.Speculation• An operation o1 is speculatively executed if it is executed before all the operations it control-dependent upon have been executed. • No exception, satisfy data dependencesAdvanced Compilers 13 L7: Instruction SchedulingCode MotionsGoal: Shorten execution time probabilisticallyMoving instructions up• Move instruction to a cut set (from entry)• Speculation: even when not anticipated.Moving instructions down• Move instruction to a cut set (from exit)• May execute extra instruction • Can duplicate code srcsrcAdvanced Compilers 14 L7: Instruction SchedulingA Note on Data Dependencesa = 0a = 1Advanced Compilers 15 L7: Instruction SchedulingGeneral-Purpose Applications• Lots of data dependences• Key performance factor: memory latencies• Move memory fetches up• Speculative memory fetches can be expensive• Control-intensive: get execution profile• Static estimation• Innermost loops are frequently executed: back edges are likely to be taken• Edges that branch to exit and exception routines are not likely to be taken• Dynamic profiling• Instrument code and measure using representative dataAdvanced Compilers 16 L7: Instruction SchedulingA Basic Global Scheduling Algorithm• Schedule innermost loops first• Only upward code motion• No creation of copies• Only one level of speculationAdvanced Compilers 17 L7: Instruction SchedulingProgram Representation• A region in a control flow graph is• a set of basic blocks and all the edges connecting these blocks,• such that control from outside the region must enter through a single entry block. • A function is represented as a hierarchy of regions• The whole control flow graph is a region• Each natural loop in the flow graph is a region• Natural loops are hierarchically nested• Schedule regions from inner to outer• treat inner loop as a black box unit,can schedule around it but not into it• ignore all the loop back edges --> get an acyclic graphAdvanced Compilers 18 L7: Instruction SchedulingAlgorithmCompute data dependences;For each region from inner to outer {For each basic block B in prioritized topological order {CandBlocks = ControlEquiv{B} ∪ Dominated-Successors{ControlEquiv{B}};CandInsts = ready operations in CandBlocks;For (t = 0, 1, ... until all operations from B are scheduled {For (n in CandInst in priority order) {if (n has no resource conflicts at time t) {S(n) = < B, t > Update resource commitmentsUpdate data dependences}}Update CandInsts;}}}• Priority functions• Non-speculative before speculativeAdvanced Compilers 19 L7: Instruction SchedulingExtensions•


View Full Document

Stanford CS 243 - Instruction Scheduling

Download Instruction Scheduling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Instruction Scheduling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Instruction Scheduling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?