List SchedulingSlides by Todd C. MowryUpdated by Peter LeeCS745: Optimizing CompilersTodd C. MowryCS745: Instruction Scheduling -2-List Scheduling! The most common technique for scheduling instructionswithin a basic blockWe don’t need to worry about:• control flowWe do need to worry about:• data dependences• hardware resources! Even without control flow, the problem is still NP-hard…y = c + dx = a + bTodd C. MowryCS745: Instruction Scheduling -3-List Scheduling Algorithm:Inputs and OutputsAlgorithm reproduced from:•“An Experimental Evaluation of List Scheduling", Keith D. Cooper, Philip J.Schielke, and Devika Subramanian. Rice University, Department of ComputerScience Technical Report 98-326, September 1998.Inputs:Output:Data Precedence Graph (DPG)MachineParametersScheduled CodeI0---I3I10I7I2I1I8---I9---I4I6I11I5Cycle01234I0 I2I6I4I3 I8I1I5I9# of FUs:2 INT, 1 FPLatencies:add = 1 cycle, …Pipelining:1 add/cycle, …Todd C. MowryCS745: Instruction Scheduling -4-List Scheduling: The Basic Idea! Maintain a list of instructions that are ready to execute• data dependence constraints would be preserved• machine resources are available! Moving cycle-by-cycle through the schedule template:• choose instructions from the list & schedule them• update the list for the next cycleI2 I0Cycle012---Todd C. MowryCS745: Instruction Scheduling -5-What Makes Life Interesting: ChoiceEasy case:• all ready instructions can be scheduled this cycleInteresting case:• we need to pick a subset of the ready instructions! List scheduling makes choices based upon priorities• assigning priorities correctly is a key challengeI5 I1 I7I5 I1 I2 I7I0???Todd C. MowryCS745: Instruction Scheduling -6-! Two different kinds of edges:! Why distinguish them?• do they affect scheduling differently?! What about output dependences?Representing Data Dependences:The Data Precedence Graph (DPG)I0: x = 1;I1: y = x;I2: x = 2;I3: z = x;I2I0I3I1DPGCodetrue “edges”: E(read-after-write)e = (I0,I1)e = (I2,I3)xx“anti-edges”: E’(write-after-read)e’ = (I1,I2)Todd C. MowryCS745: Instruction Scheduling -7-Computing Priorities! Let’s start with just true dependences (i.e. “edges” in DPG)! Priority = latency-weighted depth in the DPGI0 I2I6I4I3 I8I1I5I9Todd C. MowryCS745: Instruction Scheduling -8-Computing Priorities (Cont.)! Now let’s also take anti-dependences into account• i.e. anti-edges in the set E’I0 I2I6I4I3 I8I1I5I9e’e’Todd C. MowryCS745: Instruction Scheduling -9-List Scheduling Algorithmcycle = 0;ready-list = root nodes in DPG; inflight-list = {};while ((|ready-list|+|inflight-list| > 0) && an issue slot is available) {for op = (all nodes in ready-list in descending priority order) {if (an FU exists for op to start at cycle) {remove o p from ready-l ist and add to inflight-list;add op to schedule at time cycle;if (op has an outgoing anti-edge)add all targets of op’s anti-edges that are ready to ready-list;}}cycle = cycle + 1;for op = (all nodes in inflight-list)if (op finishes at time cycle) {remove o p from infligh t-list;check nodes waiting for op & add to ready-list if all operands available;}}}Todd C. MowryCS745: Instruction Scheduling -10-Example! 2 identical fully-pipelined FUs! adds take 2 cycles; all other insts take 1 cycleI0: a = 1I1: f = a + xI2: b = 7I3: c = 9I4: g = f + bI5: d = 13I6: e = 19;I7: h = f + cI8: j = d + yI9: z = -1I10: JMP L1I1I8I5I6I4 I7I3I10I9I2I0Cycle01234561233 2 3444 56Todd C. MowryCS745: Instruction Scheduling -11-Example! 2 identical fully-pipelined FUs! adds take 2 cycles; all other insts take 1 cycleI0: a = 1I1: f = a + xI2: b = 7I3: c = 9I4: g = f + bI5: d = 13I6: e = 19;I7: h = f + cI8: j = d + yI9: z = -1I10: JMP L1I1I8I5I6I4 I7I3I10I9I2I0Cycle0123456I0 I2I1 I3I5 I9I4 I7I8 I6--- ---I101233 2 3444 56Todd C. MowryCS745: Instruction Scheduling -12-What if We Break Ties Differently?! 2 identical fully-pipelined FUs! adds take 2 cycles; all other insts take 1 cycleI0: a = 1I1: f = a + xI2: b = 7I3: c = 9I4: g = f + bI5: d = 13I6: e = 19;I7: h = f + cI8: j = d + yI9: z = -1I10: JMP L1I1I8I5I6I4 I7I3I10I9I2I0Cycle01234561233 2 3444 56Todd C. MowryCS745: Instruction Scheduling -13-What if We Break Ties Differently?! 2 identical fully-pipelined FUs! adds take 2 cycles; all other insts take 1 cycleI0: a = 1I1: f = a + xI2: b = 7I3: c = 9I4: g = f + bI5: d = 13I6: e = 19;I7: h = f + cI8: j = d + yI9: z = -1I10: JMP L1I1I8I5I6I4 I7I3I10I9I2I0Cycle0123456I0 I2I1 I5I3 I8I4 I7I9 I6I10Todd C. MowryCS745: Instruction Scheduling -14-Contrasting the Two Schedules! Breaking ties arbitrarily may not be the best approachI1I8I5I6I4 I7I3I10I9I2I0Cycle0123456I0 I2I1 I3I5 I9I4 I7I8 I6--- ---I10Cycle012345I0 I2I1 I5I3 I8I4 I7I9 I6I101233 2 3444 56Todd C. MowryCS745: Instruction Scheduling -15-Backward List SchedulingModify the algorithm as follows:• reverse the direction of all edges in the DPG• schedule the finish times of each operation• start times must still be used to ensure FU availabilityImpact of scheduling backwards:• clusters operations near the end (vs. the beginning)• may be either better or worse than forward
View Full Document