Instruction Selection and Scheduling and Phase IVStructure of a Compiler A compiler is a lot of fast stuff followed by some hard problems → The hard stuff is mostly in optimization and code generation → For superscalars, its allocation & scheduling that count Instruction Selection Register Allocation Instruction Scheduling Scanner Parser Analysis & Optimization O(n log n) O(n) O(n) NP-Complete NP-Complete Either fast or NP-Complete words IR IR asm asm asm ∞ regs ∞ regs k regsStructure of a Compiler For the rest of CISC672, we assume the following model • Selection is fairly simple (problem of the 1980s) • Allocation & scheduling are complex • Operation placement is not yet critical (unified register set) What about the IR ? • Low-level, RISC-like IR called ILOC • Has “enough” registers • ILOC was designed for this stuff {Branches, compares, & labels Memory tags Hierarchy of loads & stores Provision for multiple ops/cycle Instruction Selection Instruction Scheduling Register Allocation Analysis & Optimization IR asm asm ∞ regs ∞ regs ∞ regs ∞ regs k regs asm IRDefinitions Instruction selection • Mapping IR into assembly code • Assumes a fixed storage mapping & code shape • Combining operations, using address modes Instruction scheduling • Reordering operations to hide latencies • Assumes a fixed program (set of operations) • Changes demand for registers Register allocation • Deciding which values will reside in registers • Changes the storage mapping, may add false sharing • Concerns about placement of data & memory operations These 3 problems are tightly coupled.The Big Picture How hard are these problems? Instruction selection • Can make locally optimal choices, with automated tool • Global optimality is (undoubtedly) NP-Complete Instruction scheduling • Single basic block ⇒ heuristics work well • General problem, with control flow ⇒ NP-Complete Register allocation • Single basic block, no spilling, & 1 register size ⇒ linear time • Whole procedure is NP-CompleteThe Big Picture Conventional wisdom says that we lose little by solving these problems independently Instruction selection • Use some form of pattern matching • Assume enough registers or target “important” values Instruction scheduling • Within a block, list scheduling is “close” to optimal • Across blocks, build framework to apply list scheduling Register allocation • Start from virtual registers & map “enough” into k • With targeting, focus on good priority heuristic This slide is full of “fuzzy” terms Optimal for > 85% of blocksThe Problem Writing a compiler is a lot of work • Would like to reuse components whenever possible • Would like to automate construction of components • Front end construction is largely automated • Middle is largely hand crafted • (Parts of ) back end can be automated Front End Back End Middle End Infrastructure Today’s lecture: Automating Instruction SelectionDefinitions Instruction selection • Mapping IR into assembly code • Assumes a fixed storage mapping & code shape • Combining operations, using address modes Instruction scheduling • Reordering operations to hide latencies • Assumes a fixed program (set of operations) • Changes demand for registers Register allocation • Deciding which values will reside in registers • Changes the storage mapping, may add false sharing • Concerns about placement of data & memory operationsThe Problem Modern computers (still) have many ways to do anything Consider register-to-register copy in ILOC • Obvious operation is i2i ri ⇒ rj • Many others exist • Human would ignore all of these • Algorithm must look at all of them & find low-cost encoding → Take context into account (busy functional unit?) addI ri,0 ⇒ rj subI ri,0 ⇒ rj lshiftI ri,0 ⇒ rj multI ri,1 ⇒ rj divI ri,1 ⇒ rj!rshiftI ri,0 ⇒ rj!orI ri,0 ⇒ rj!xorI ri,0 ⇒ rj … and others …The Goal Want to automate generation of instruction selectors Front End Back End Middle End Infrastructure Tables Pattern Matching Engine Back-end Generator Machine description Description-based retargetingThe Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <a,ARP,4> IDENT <b,ARP,8> loadI 4 ⇒ r5 loadAO rarp,r5 ⇒ r6 loadI 8 ⇒ r7 loadAO rarp,r7 ⇒ r8 mult r6,r8 ⇒ r9 loadAI rarp,4 ⇒ r5 loadAI rarp,8 ⇒ r6 mult r5,r6 ⇒ r7 Tree Treewalk Code Desired CodeThe Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <a,ARP,4> IDENT <b,ARP,8> loadI 4 ⇒ r5 loadAO rarp,r5 ⇒ r6 loadI 8 ⇒ r7 loadAO rarp,r7 ⇒ r8 mult r6,r8 ⇒ r9 loadAI rarp,4 ⇒ r5 loadAI rarp,8 ⇒ r6 mult r5,r6 ⇒ r7 Tree Treewalk Code Desired Code Pretty easy to fix. See 1st digression in Ch. 7The Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <a,ARP,4> NUMBER <2> loadI 4 ⇒ r5 loadAO rarp,r5 ⇒ r6 loadI 2 ⇒ r7 mult r6,r7 ⇒ r8 loadAI rarp,4 ⇒ r5 multI r5,2 ⇒ r7 Tree Treewalk Code Desired CodeThe Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <a,ARP,4> NUMBER <2> loadI 4 ⇒ r5 loadAO rarp,r5 ⇒ r6 loadI 2 ⇒ r7 mult r6,r7 ⇒ r8 loadAI rarp,4 ⇒ r5 multI r5,2 ⇒ r7 Tree Treewalk Code Desired Code Must combine these This is a nonlocal problemThe Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <c,@G,4> IDENT <d,@H,4> loadI @G ⇒ r5 loadI 4 ⇒ r6 loadAO r5,r6 ⇒ r7 loadI
View Full Document