DOC PREVIEW
UD CISC 672 - Instruction Selection and Scheduling and Phase IV

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Instruction Selection and Scheduling and Phase IVStructure of a Compiler A compiler is a lot of fast stuff followed by some hard problems → The hard stuff is mostly in optimization and code generation → For superscalars, its allocation & scheduling that count Instruction Selection Register Allocation Instruction Scheduling Scanner Parser Analysis & Optimization O(n log n) O(n) O(n) NP-Complete NP-Complete Either fast or NP-Complete words IR IR asm asm asm ∞ regs ∞ regs k regsStructure of a Compiler For the rest of CISC672, we assume the following model • Selection is fairly simple (problem of the 1980s) • Allocation & scheduling are complex • Operation placement is not yet critical (unified register set) What about the IR ? • Low-level, RISC-like IR called ILOC • Has “enough” registers • ILOC was designed for this stuff {Branches, compares, & labels Memory tags Hierarchy of loads & stores Provision for multiple ops/cycle Instruction Selection Instruction Scheduling Register Allocation Analysis & Optimization IR asm asm ∞ regs ∞ regs ∞ regs ∞ regs k regs asm IRDefinitions Instruction selection • Mapping IR into assembly code • Assumes a fixed storage mapping & code shape • Combining operations, using address modes Instruction scheduling • Reordering operations to hide latencies • Assumes a fixed program (set of operations) • Changes demand for registers Register allocation • Deciding which values will reside in registers • Changes the storage mapping, may add false sharing • Concerns about placement of data & memory operations These 3 problems are tightly coupled.The Big Picture How hard are these problems? Instruction selection • Can make locally optimal choices, with automated tool • Global optimality is (undoubtedly) NP-Complete Instruction scheduling • Single basic block ⇒ heuristics work well • General problem, with control flow ⇒ NP-Complete Register allocation • Single basic block, no spilling, & 1 register size ⇒ linear time • Whole procedure is NP-CompleteThe Big Picture Conventional wisdom says that we lose little by solving these problems independently Instruction selection • Use some form of pattern matching • Assume enough registers or target “important” values Instruction scheduling • Within a block, list scheduling is “close” to optimal • Across blocks, build framework to apply list scheduling Register allocation • Start from virtual registers & map “enough” into k • With targeting, focus on good priority heuristic This slide is full of “fuzzy” terms Optimal for > 85% of blocksThe Problem Writing a compiler is a lot of work • Would like to reuse components whenever possible • Would like to automate construction of components • Front end construction is largely automated • Middle is largely hand crafted • (Parts of ) back end can be automated Front End Back End Middle End Infrastructure Today’s lecture: Automating Instruction SelectionDefinitions Instruction selection • Mapping IR into assembly code • Assumes a fixed storage mapping & code shape • Combining operations, using address modes Instruction scheduling • Reordering operations to hide latencies • Assumes a fixed program (set of operations) • Changes demand for registers Register allocation • Deciding which values will reside in registers • Changes the storage mapping, may add false sharing • Concerns about placement of data & memory operationsThe Problem Modern computers (still) have many ways to do anything Consider register-to-register copy in ILOC • Obvious operation is i2i ri ⇒ rj • Many others exist • Human would ignore all of these • Algorithm must look at all of them & find low-cost encoding → Take context into account (busy functional unit?) addI ri,0 ⇒ rj subI ri,0 ⇒ rj lshiftI ri,0 ⇒ rj multI ri,1 ⇒ rj divI ri,1 ⇒ rj!rshiftI ri,0 ⇒ rj!orI ri,0 ⇒ rj!xorI ri,0 ⇒ rj … and others …The Goal Want to automate generation of instruction selectors Front End Back End Middle End Infrastructure Tables Pattern Matching Engine Back-end Generator Machine description Description-based retargetingThe Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <a,ARP,4> IDENT <b,ARP,8> loadI 4 ⇒ r5 loadAO rarp,r5 ⇒ r6 loadI 8 ⇒ r7 loadAO rarp,r7 ⇒ r8 mult r6,r8 ⇒ r9 loadAI rarp,4 ⇒ r5 loadAI rarp,8 ⇒ r6 mult r5,r6 ⇒ r7 Tree Treewalk Code Desired CodeThe Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <a,ARP,4> IDENT <b,ARP,8> loadI 4 ⇒ r5 loadAO rarp,r5 ⇒ r6 loadI 8 ⇒ r7 loadAO rarp,r7 ⇒ r8 mult r6,r8 ⇒ r9 loadAI rarp,4 ⇒ r5 loadAI rarp,8 ⇒ r6 mult r5,r6 ⇒ r7 Tree Treewalk Code Desired Code Pretty easy to fix. See 1st digression in Ch. 7The Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <a,ARP,4> NUMBER <2> loadI 4 ⇒ r5 loadAO rarp,r5 ⇒ r6 loadI 2 ⇒ r7 mult r6,r7 ⇒ r8 loadAI rarp,4 ⇒ r5 multI r5,2 ⇒ r7 Tree Treewalk Code Desired CodeThe Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <a,ARP,4> NUMBER <2> loadI 4 ⇒ r5 loadAO rarp,r5 ⇒ r6 loadI 2 ⇒ r7 mult r6,r7 ⇒ r8 loadAI rarp,4 ⇒ r5 multI r5,2 ⇒ r7 Tree Treewalk Code Desired Code Must combine these This is a nonlocal problemThe Big Picture Need pattern matching techniques • Must produce good code (some metric for good ) • Must run quickly A treewalk code generator runs quickly How good was the code? xIDENT <c,@G,4> IDENT <d,@H,4> loadI @G ⇒ r5 loadI 4 ⇒ r6 loadAO r5,r6 ⇒ r7 loadI


View Full Document

UD CISC 672 - Instruction Selection and Scheduling and Phase IV

Documents in this Course
Syllabus

Syllabus

18 pages

Load more
Download Instruction Selection and Scheduling and Phase IV
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Instruction Selection and Scheduling and Phase IV and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Instruction Selection and Scheduling and Phase IV 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?