Advanced Compilers CMPSCI 710 Spring 2003 Instruction SchedulingModern ArchitecturesInstruction SchedulingScheduling for Pipelined ArchitecturesGibbons & Muchnick, IGibbons & Muchnick, IIDependence DAGScheduling ExampleSlide 9Scheduling AlgorithmInstruction Scheduling HeuristicsScheduling Algorithm, ExampleSlide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Scheduling Algorithm ComplexityEmpirical ResultsNext TimeUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer ScienceEmery BergerUniversity of Massachusetts, AmherstAdvanced CompilersCMPSCI 710Spring 2003Instruction SchedulingUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science2Modern ArchitecturesLots of features to increase performance and hide memory latencySuperscalarMultiple logic unitsMultiple issue2 or more instructions issued per cycleSpeculative executionBranch predictorsSpeculative loadsDeep pipelinesUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science3Instruction SchedulingChallenges to achieving instruction-level parallelism:Structural hazards:Insufficient resources to exploit parallelismData hazardsInstruction depends on result of previous instruction still in pipelineControl hazardsBranches & jumps modify PCaffect which instructions should be in pipelineUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science4Scheduling for Pipelined ArchitecturesCompiler reorders (“schedules”) instructions to maximize ILP = minimize stalls (“bubbles”) in pipelinePerform after code generation & register allocationFirst approach: [Hennessy & Gross 1983]O(n4), n = instructions in basic blockToday: [Gibbons & Muchnick 1986]O(n2)UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science5Gibbons & Muchnick, IAssumptions:Hardware hazard detectionAlgorithm not required to introduce nopsEach memory location referenced via offset of single base registerPointer may reference all of memoryLoad followed by add creates interlock (stall)Hazards only take single cycleUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science6Gibbons & Muchnick, IIFor each basic block:Construct directed acyclic graph (DAG) using dependences between statementsNode = statement / instructionEdge (a,b) = statement a must execute before bSchedule instructions using the DAGUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science7Dependence DAGCannot reorder two dependent instructionsData dependencies:True dependence (RAW = read-after-write)Instruction can’t be executed until all required operands availableAnti-dependence (WAR)Write must not occur before readOutput dependence (WAW)Earlier write cannot overwrite later oneUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science8Scheduling Example1. r8 = [r12+8](4)2. r1 = r8 + 13. r2 = 24. call r14,r315. nop6. r9 = r1 + 1UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science9Scheduling Example1. r8 = [r12+8](4)2. r1 = r8 + 13. r2 = 24. call r14,r315. nop6. r9 = r1 + 1We can reschedule to remove nop in delay slot: (1,3,4,2,6)UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science10Scheduling AlgorithmConstruct dependence dag on basic blockPut roots in candidate setUse scheduling heuristics (in order) to select instructionTake into account terminating instruction of predecessor basic blocksWhile candidate set not emptyEvaluate all candidates and select best oneDelete scheduled instruction from candidate setAdd newly-exposed candidatesUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science11Instruction Scheduling HeuristicsNP-complete ) we need heuristicsBias scheduler to prefer instructions:Interlock with dag successorsAllow other operations can proceedHave many successorsMore flexibility in schedulingProgress along critical pathFree registersReduce register pressureetc. (see ACDI p. 542)UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science12Scheduling Algorithm, ExampleExecTime(n):cycles to execute statement nLet ExecTime(6) = 2, ExecTime(others) = 1;assume instruction latency = 1Compute Delay(n):=ExecTime(n), if n is leaf=maxm 2 Succ(n) Delay(m)+1UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science13Scheduling Algorithm, ExampleStart at CurTime = 0ETime(n): earliest time node should be scheduled to avoid stallInitally 0Cands = {1,3}MCands: set of candidates with max delay time to end of blockECands:set whose earliest start time is at most current timeMCands = ECands = {3}UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science14Scheduling Algorithm, ExampleScheduled node 3Cands = {1}CurTime = 1ETime(4) = 1UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science15Scheduling Algorithm, ExampleScheduled node 1Cands = {2}CurTime = 2ETime(2) = 1, ETime(4) = 4UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science16Scheduling Algorithm, ExampleScheduled node 2Cands = {4}CurTime = 3ETime(4) = 4UUNIVERSITY OF NIVERSITY OF
View Full Document