Unformatted text preview:

Static Code SchedulingCode SchedulingWhy Must the Compiler Schedule?ExampleSlide 5Code Scheduling StrategyScheduling ApproachesBranch SchedulingRecall from Architecture…Control HazardsData DependencesData HazardsSlide 13Multi-cycle InstructionsAvoiding data hazardsExample: Without SchedulingBasic Block Dependence DAGSExample – Build the DAGCreating a scheduleOperation PriorityComputing PrioritiesExample – Determine Height and CPExample – List SchedulingScheduling vs. Register AllocationRegister RenamingVLIWSample VLIW codeMulti-Issue Scheduling ExampleEarliest Latest SetsList Scheduling AlgorithmImproving Basic Block SchedulingStatic Code SchedulingCS 671April 1, 20082CS 671 – Spring 2008Code SchedulingScheduling or reordering instructions to improve performance and/or guarantee correctness•Important for dynamically-scheduled architectures•Crucial (assumed!) for statically-scheduled architectures, e.g. VLIW or EPICTakes into account anticipated latencies•Machine-specific, performed later in the optimization passHow does this contrast with our earlier exploration of code motion?3CS 671 – Spring 2008Many machines are pipelined and expose some aspects of pipelining to the user (compiler)Examples:•Branch delay slots!•Memory-access delays•Multi-cycle operationsSome machines don’t have scheduling hardwareWhy Must the Compiler Schedule?4CS 671 – Spring 2008ExampleAssume loads take 2 cycles and branches have a delay slot.____cyclesinstruction start timer2  [r1]r3  [r1+4]r4  r2 + r3r5  r2 + 1goto L1nop5CS 671 – Spring 2008ExampleAssume loads take 2 cycles and branches have a delay slot.____cyclesinstruction start timer2  [r1]r3  [r1+4]r5  r2 + 1goto L1r4  r2 + r36CS 671 – Spring 2008Code Scheduling StrategyGet resources operating in parallel•Integer data path•Integer multiply / divide hardware•FP adder, multiplier, dividerMethod•Fill with computations that do not require result or same hardware resourcesDrawbacks•Highly hardware dependentStart OpUse OpTry to fill7CS 671 – Spring 2008Scheduling ApproachesLocalBranch schedulingBasic-block schedulingGlobalCross-block schedulingSoftware pipeliningTrace schedulingPercolation scheduling8CS 671 – Spring 2008Branch SchedulingTwo problems: Branches often take some number of cycles to completeCan be a delay between a compare b and its associated branchA compiler will try to fill these slots with valid instructions (rather than nop)Delay slots – present in PA-RISC, SPARC, MIPSCondition delay – PowerPC, Pentium9CS 671 – Spring 2008Recall from Architecture…IF – Instruction FetchID – Instruction DecodeEX – ExecuteMA – Memory accessWB – Write backIFIFIFIDIDIDEXEXEXMAMAMAWBWBWB10CS 671 – Spring 2008Control HazardsIFIFID---EX---MA--- ---WBIF ID EX MA WBIF ID EX MA WBTaken BranchInstr + 1Branch TargetBranch Target + 111CS 671 – Spring 2008Data DependencesIf two operations access the same register, they are dependentTypes of data dependencesFlowOutput Antir1 = r2 + r3r4 = r1 * 6r1 = r2 + r3r1 = r4 * 6r1 = r2 + r3r2 = r5 * 612CS 671 – Spring 2008Data HazardsIFIFIDIDEXEXMAMA WBWBlw R1,0(R2)add R3,R1,R4stallMemory latency: data not ready13CS 671 – Spring 2008Data HazardsIFIFIDIDEX EX MAMA WBWBaddf R3,R1,R2addf R3,R3,R4stallEX EXAssumes floating point ops take 2 execute cyclesInstruction latency: execute takes > 1 cycle14CS 671 – Spring 2008Multi-cycle Instructions•Scheduling is particularly important for multi-cycle operations•Alpha instructions > 1 cycle latency (partial list)mull (32-bit integer multiply) 8mulq (64-bit integer multiply) 16addt (fp add) 4mult (fp multiply) 4divs (fp single-precision divide) 10divt (fp double-precision divide) 2315CS 671 – Spring 2008Avoiding data hazards•Move loads earlier and stores later (assuming this does not violate correctness) •Other stalls may require more sophisticated re-ordering, i.e. ((a+b)+c)+d becomes (a+b)+(c+d) •How can we do this in a systematic way??16CS 671 – Spring 2008Example: Without SchedulingStart TimeCodelw r1, wadd r1,r1,r1lw r2,xmult r1,r1,r2lw r2,ymult r1,r1,r2lw r2,zmult r1,r1,r2sw r1, aAssume:• memory instrs take 3 cycles• mult takes 2 cycles (to haveresult in register)• rest take 1 cycle____cycles17CS 671 – Spring 2008Basic Block Dependence DAGSNodes - instructionsEdges - dependence between I1 and I2•When we cannot determine whether there is a dependence, we must assume there is onea) lw R2, (R1)b) lw R3, (R1) 4c) R4  R2 + R3d) R5  R2 - 1a bd c22218CS 671 – Spring 2008Example – Build the DAGCodea lw r1, wb add r1,r1,r1c load r2,xd mult r1,r1,r2e load r2,yf mult r1,r1,r2g load r2,zh mult r1,r1,r2i sw r1, aAssume: memory instrs = 3 mult = 2 (to have result in register) rest = 1 cycle19CS 671 – Spring 2008Creating a schedule•Create a DAG of dependences•Determine priority•Schedule instructions with–Ready operands–Highest priority•Heuristics: If multiple possibilities, fall back on other priority functions20CS 671 – Spring 2008Operation PriorityPriority – Need a mechanism to decide which ops to schedule first (when you have choices)Common priority functions•Height – Distance from exit node–Give priority to amount of work left to do•Slackness – inversely proportional to slack–Give priority to ops on the critical path•Register use – priority to nodes with more source operands and fewer destination operands–Reduces number of live registers •Uncover – high priority to nodes with many children–Frees up more nodes•Original order – when all else fails21CS 671 – Spring 2008Computing PrioritiesHeight(n) =•exec(n) if n is a leaf•max(height(m)) + exec(n) for m, where m is a successor of nCritical path(s) = path through the dependence DAG with longest latency22CS 671 – Spring 2008Example – Determine Height and CPCodea lw r1, wb add r1,r1,r1c lw r2,xd mult r1,r1,r2e lw r2,yf mult r1,r1,r2g lw r2,zh mult r1,r1,r2i sw r1, aAssume: memory instrs = 3 mult = 2 = (to have result in register) rest = 1 cycleCritical path: _______23CS 671 – Spring 2008Example – List SchedulingCodea lw r1, wb add r1,r1,r1c lw r2,xd mult r1,r1,r2e lw r2,yf mult r1,r1,r2g lw r2,zh mult r1,r1,r2i sw r1, astartSchedule_____cycles24CS 671 – Spring 2008Scheduling vs. Register AllocationCodea lw r1  (r12)b lw r2  (r12+4)c r1  r1+r2d stw (r12)  r1e lw r1  (r12+8)f lw


View Full Document

UVA CS 671 - Static Code Scheduling

Download Static Code Scheduling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Static Code Scheduling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Static Code Scheduling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?