Unformatted text preview:

Copyright 2006, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Instruction Scheduling — combining scheduling with allocation — COMP 512 Rice University Spring 2006 In which, Keith tries to explain why a simple result excited him so much …. COMP 512, Fall 2006! 2!Combining Scheduling & Allocation Sometimes, combining two optimizations can produce solutions that cannot be obtained by solving them independently •! Requires bilateral interactions between optimizations !! Click and Cooper, “Combining Analyses, Combining Optimizations”, TOPLAS 17(2), March 1995. •! Combining two optimizations can be a challenge (SCCP) Scheduling & allocation are a classic example •! Scheduling changes variable lifetimes •! Renaming in the allocator changes dependences •! Spilling changes the underlying code false dependencesCOMP 512, Fall 2006! 3!Many authors have tried to combine allocation & scheduling •! Underallocate to leave “room” for the scheduler !! Can result in underutilization of registers •! Preallocate to use all registers !! Can create false dependences •! Solving the problems together can produce solutions that cannot be obtained by solving them independently !! See Click and Cooper, “Combining Analyses, Combining Optimizations”, TOPLAS 17(2), March 1995. In general, these papers try to combine global allocators with local or regional schedulers — an algorithmic mismatch Combining Scheduling & Allocation Before we go there, a long digression about how much improvement we might expect … COMP 512, Fall 2006! 4!Quick Review of Local Scheduling Given a sequence of machine operations, reorder the operations so that •! Data dependences are respected •! Execution time is minimized •! Demand for registers is kept below k Vocabulary: •! An operation is an indivisible command •! An instruction is a set of operations that issue in the same cycle •! A dependence graph is constructed to represent necessary delays (Nodes are operations; edges show the flow of values; edge weights represent operation latencies)COMP 512, Fall 2006! 5!Scheduling Example •! Many operations have non-zero latencies •! Modern machines can issue several operations per cycle •! Execution time is order-dependent (and has been since the 60’s) Assumed latencies (conservative) Operation Cycles load 3 store 3 loadI 1 add 1 mult 2 fadd 1 fmult 2 shift 1 branch 0 to 8 •! Loads & stores may or may not block >! Non-blocking "fill those issue slots •! Branch costs vary with path taken •! Branches typically have delay slots >! Fill slots with unrelated operations >! Percolates branch upward •! Scheduler should hide the latencies List scheduling is dominant algorithm COMP 512, Fall 2006! 6!Example w # w * 2 * x * y * z 1loadAI r0,@w! r14add r1,r1 ! r15loadAI r0,@x! r28mult r1,r2 ! r19loadAI r0,@y! r212mult r1,r2 ! r113loadAI r0,@z ! r216mult r1,r2! r1 18storeAI r1 ! r0,@w21r1 is free1loadAI r0,@w! r12loadAI r0,@x ! r23loadAI r0,@y! r34add r1,r1 ! r15mult r1,r2! r16loadAI r0,@z ! r27mult r1,r3 ! r19mult r1,r2! r111storeAI r1 ! r0,@w14r1 is freeSimple schedule Schedule loads early 2 registers, 20 cycles 3 registers, 13 cycles Reordering operations for speed is called instruction schedulingCOMP 512, Fall 2006! 7!Instruction Scheduling (The Abstract View) To capture properties of the code, build a dependence graph G •! Nodes n ! G are operations with type(n) and delay(n) •! An edge e = (n1,n2) ! G if & only if n2 uses the result of n1 a: loadAI r0,@w! r1b: add r1,r1 ! r1c: loadAI r0,@x! r2d: mult r1,r2 ! r1e: loadAI r0,@y! r2f: mult r1,r2 ! r1g: loadAI r0,@z ! r2h: mult r1,r2! r1i: storeAI r1 ! r0,@wThe Code a b c d e f g h i The Dependence Graph COMP 512, Fall 2006! 8!Instruction Scheduling (Definitions) A correct schedule S maps each n! N into a non-negative integer representing its cycle number, and 1. S(n) ! 0, for all n ! N, obviously 2. If (n1,n2) ! E, S(n1 ) + delay(n1 ) ! S(n2 ) 3. For each type of operation (functional unit), t, there are no more operations of type t in any cycle than the target machine can issue The length of a schedule S, denoted L(S), is L(S) = maxn ! N (S(n) + delay(n)) The goal is to find the shortest possible correct schedule. S is time-optimal if L(S) ! L(S1 ), for all other schedules S1 A schedule might also be optimal in terms of registers or power, or ….COMP 512, Fall 2006! 9!What’s so difficult? Critical Points •! All operands must be available when an operation issues •! Multiple operations can be ready (& often are …) •! Moving an operation can lengthen register lifetimes or it can shorten register lifetimes •! Operands can have multiple predecessors (not SSA) Together, these issues make scheduling hard (NP-Complete) Local scheduling is the simple case •! Restricted to straight-line code •! Consistent and predictable latencies COMP 512, Fall 2006! 10!Instruction Scheduling The big picture 1. Build a dependence graph, D 2. Compute a priority function over the nodes in D 3. Use list scheduling to construct a schedule, 1 cycle at a time a. Use a queue of operations that are ready b. At each cycle I. Choose a ready operation and schedule it II. Update the ready queue Local list scheduling •! The dominant algorithm for twenty years •! A greedy, heuristic, local techniqueCOMP 512, Fall 2006! 11!Local List Scheduling Cycle # 1 Ready # leaves of D Active # Ø while (Ready $ Active % Ø) if (Ready % Ø) then remove an op from Ready S(op) # Cycle Active # Active $ op Cycle # Cycle + 1 for each op & Active if (S(op) + delay(op) ! Cycle) then remove op from Active for each successor s of op in D if (s is ready) then Ready # Ready $ s Removal in priority order op has completed execution If successor’s operands are ready, put it on Ready Can improve efficiency by using a set of Queues (1 more than maximum delay on target machine) — see 412 notes COMP 512, Fall 2006! 12!Scheduling Example 1.! Build the dependence graph a: loadAI r0,@w! r1b: add r1,r1 ! r1c: loadAI r0,@x! r2d: mult r1,r2 ! r1e: loadAI r0,@y! r2f: mult r1,r2 ! r1g: loadAI r0,@z ! r2h: mult r1,r2! r1i: storeAI r1 ! r0,@wThe Code a b c d e f g h i The Dependence GraphCOMP 512, Fall 2006! 13!Scheduling Example 1.!


View Full Document

Rice COMP 512 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?