Unformatted text preview:

Advanced Compilers CMPSCI 710 Spring 2003 Instruction SchedulingModern ArchitecturesInstruction SchedulingScheduling for Pipelined ArchitecturesGibbons & Muchnick, IGibbons & Muchnick, IIDependence DAGScheduling ExampleSlide 9Scheduling AlgorithmInstruction Scheduling HeuristicsScheduling Algorithm, ExampleSlide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Scheduling Algorithm ComplexityEmpirical ResultsNext TimeUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer ScienceEmery BergerUniversity of Massachusetts, AmherstAdvanced CompilersCMPSCI 710Spring 2003Instruction SchedulingUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science2Modern ArchitecturesLots of features to increase performance and hide memory latencySuperscalarMultiple logic unitsMultiple issue2 or more instructions issued per cycleSpeculative executionBranch predictorsSpeculative loadsDeep pipelinesUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science3Instruction SchedulingChallenges to achieving instruction-level parallelism:Structural hazards:Insufficient resources to exploit parallelismData hazardsInstruction depends on result of previous instruction still in pipelineControl hazardsBranches & jumps modify PCaffect which instructions should be in pipelineUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science4Scheduling for Pipelined ArchitecturesCompiler reorders (“schedules”) instructions to maximize ILP = minimize stalls (“bubbles”) in pipelinePerform after code generation & register allocationFirst approach: [Hennessy & Gross 1983]O(n4), n = instructions in basic blockToday: [Gibbons & Muchnick 1986]O(n2)UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science5Gibbons & Muchnick, IAssumptions:Hardware hazard detectionAlgorithm not required to introduce nopsEach memory location referenced via offset of single base registerPointer may reference all of memoryLoad followed by add creates interlock (stall)Hazards only take single cycleUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science6Gibbons & Muchnick, IIFor each basic block:Construct directed acyclic graph (DAG) using dependences between statementsNode = statement / instructionEdge (a,b) = statement a must execute before bSchedule instructions using the DAGUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science7Dependence DAGCannot reorder two dependent instructionsData dependencies:True dependence (RAW = read-after-write)Instruction can’t be executed until all required operands availableAnti-dependence (WAR)Write must not occur before readOutput dependence (WAW)Earlier write cannot overwrite later oneUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science8Scheduling Example1. r8 = [r12+8](4)2. r1 = r8 + 13. r2 = 24. call r14,r315. nop6. r9 = r1 + 1UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science9Scheduling Example1. r8 = [r12+8](4)2. r1 = r8 + 13. r2 = 24. call r14,r315. nop6. r9 = r1 + 1We can reschedule to remove nop in delay slot: (1,3,4,2,6)UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science10Scheduling AlgorithmConstruct dependence dag on basic blockPut roots in candidate setUse scheduling heuristics (in order) to select instructionTake into account terminating instruction of predecessor basic blocksWhile candidate set not emptyEvaluate all candidates and select best oneDelete scheduled instruction from candidate setAdd newly-exposed candidatesUUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science11Instruction Scheduling HeuristicsNP-complete ) we need heuristicsBias scheduler to prefer instructions:Interlock with dag successorsAllow other operations can proceedHave many successorsMore flexibility in schedulingProgress along critical pathFree registersReduce register pressureetc. (see ACDI p. 542)UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science12Scheduling Algorithm, ExampleExecTime(n):cycles to execute statement nLet ExecTime(6) = 2, ExecTime(others) = 1;assume instruction latency = 1Compute Delay(n):=ExecTime(n), if n is leaf=maxm 2 Succ(n) Delay(m)+1UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science13Scheduling Algorithm, ExampleStart at CurTime = 0ETime(n): earliest time node should be scheduled to avoid stallInitally 0Cands = {1,3}MCands: set of candidates with max delay time to end of blockECands:set whose earliest start time is at most current timeMCands = ECands = {3}UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science14Scheduling Algorithm, ExampleScheduled node 3Cands = {1}CurTime = 1ETime(4) = 1UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science15Scheduling Algorithm, ExampleScheduled node 1Cands = {2}CurTime = 2ETime(2) = 1, ETime(4) = 4UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science16Scheduling Algorithm, ExampleScheduled node 2Cands = {4}CurTime = 3ETime(4) = 4UUNIVERSITY OF NIVERSITY OF


View Full Document

UMass Amherst CMPSCI 710 - Advanced Compilers

Download Advanced Compilers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Advanced Compilers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Advanced Compilers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?