Intermediate RepresentationsIntermediate Representations • Front end - produces an intermediate representation (IR) • Middle end - transforms the IR into an equivalent IR that runs more efficiently • Back end - transforms the IR into native code • IR encodes the compiler’s knowledge of the program • Middle end usually consists of several passes Front End Middle End Back End IR IR Source Code Target CodeIntermediate Representations • Decisions in IR design affect the speed and efficiency of the compiler • Some important IR properties → Ease of generation → Ease of manipulation → Procedure size → Freedom of expression → Level of abstraction • The importance of different properties varies between compilers → Selecting an appropriate IR for a compiler is criticalTypes of Intermediate Representations Three major categories • Graphically oriented → Heavily used in source-to-source translators → Tend to be large • Linear → Pseudo-code for an abstract machine → Level of abstraction varies → Simple, compact data structures → Easier to rearrange • Hybrid → Combination of graphs and linear code Examples: Trees, DAGs Examples: 3 address code Stack machine code Example: Control-flow graphLevel of Abstraction • The level of detail exposed in an IR influences the profitability and feasibility of different optimizations. • Two different representations of an array reference: subscript A i j loadI 1 => r1!sub rj, r1 => r2!loadI 10 => r3!mult r2, r3 => r4!sub ri, r1 => r5!add r4, r5 => r6!loadI @A => r7!Add r7, r6 => r8!load r8 => rAij!High level AST: Good for memory disambiguation Low level linear code: Good for address calculationLevel of Abstraction • Structural IRs are usually considered high-level • Linear IRs are usually considered low-level • Not necessarily true: + * 10 j 1 - j 1 - + @A load Low level AST loadArray A,i,j!High level linear codeAbstract Syntax Tree An abstract syntax tree is the procedure’s parse tree with the nodes for most non-terminal nodes removed x - 2 * y • Can use linearized form of the tree → Easier to manipulate than pointers x 2 y * - in postfix form - * 2 y x in prefix form - x 2 y *Directed Acyclic Graph A directed acyclic graph (DAG) is an AST with a unique node for each value • Makes sharing explicit • Encodes redundancy x 2 y * - ← z / ← w z ← x - 2 * y w ← x / 2 Same expression twice means that the compiler might arrange to evaluate it just once!Stack Machine Code Originally used for stack-based computers, now Java • Example: x - 2 * y becomes Advantages • Compact form • Introduced names are implicit, not explicit • Simple to generate and execute code Useful where code is transmitted over slow communication links (the net ) push x!push 2!push y!multiply!subtract!Implicit names take up no space, where explicit ones do!Three Address Code Several different representations of three address code • In general, three address code has statements of the form: x ← y op z With 1 operator (op ) and, at most, 3 names (x, y, & z) Example: z ← x - 2 * y becomes Advantages: • Resembles many machines • Introduces a new set of names • Compact form t ← 2 * y!z ← x - t!*Three Address Code: Quadruples Naïve representation of three address code • Table of k * 4 small integers • Simple record structure • Easy to reorder • Explicit names load 1 Y loadi 2 2 mult 3 2 1 load 4 X sub 5 4 2 load r1, y!loadI r2, 2!mult r3, r2, r1!load r4, x!sub r5, r4, r3!RISC assembly code Quadruples The original FORTRAN compiler used “quads”Three Address Code: Triples • Index used as implicit name • 25% less space consumed than quads • Much harder to reorder load y loadI 2 mult (1) (2) load x sub (4) (3) (1) (2) (3) (4) (5) Implicit names take no space!Three Address Code: Indirect Triples • List first triple in each statement • Implicit name space • Uses more space than triples, but easier to reorder • Major tradeoff between quads and triples is compactness versus ease of manipulation → In the past compile-time space was critical → Today, speed may be more important load y loadI 2 mult (100) (101) load x sub (103) (102) (100) (101) (102) (103) (104) (100) (105)Static Single Assignment Form • The main idea: each name defined exactly once • Introduce φ-functions to make it work Strengths of SSA-form • Sharper analysis • φ-functions give hints about placement • (sometimes) faster algorithms Original x ← …!y ← …!while (x < k)! x ← x + 1! y ← y + x!SSA-form x0 ← …! y0 ← …! if (x0 > k) goto next!loop: x1 ← φ(x0,x2) ! y1 ← φ(y0,y2)! x2 ← x1 + 1 ! y2 ← y1 + x2! if (x2 < k) goto loop!next: … !Two Address Code • Allows statements of the form x ← x op y Has 1 operator (op ) and, at most, 2 names (x and y) Example: z ← x - 2 * y becomes • Can be very compact Problems • Machines no longer rely on destructive operations • Difficult name space → Destructive operations make reuse hard → Good model for machines with destructive ops (PDP-11) t1 ← 2!t2 ← load y!t2 ← t2 * t1!z ← load x!z ← z - t2!Control-flow Graph Models the transfer of control in the procedure • Nodes in the graph are basic blocks → Can be represented with quads or any other linear representation • Edges in the graph represent control flow Example if (x = y) a ← 2 b ← 5 a ← 3 b ← 4 c ← a * b Basic blocks — Maximal length sequences of straight-line codeUsing Multiple Representations • Repeatedly lower the level of the intermediate representation → Each intermediate representation is suited towards certain optimizations • Example: the Open64 compiler → WHIRL intermediate format ♦ Consists of 5 different IRs that are progressively more detailed Front End Middle End Back End IR 1 IR 3 Source Code Target Code Middle End IR 2Memory Models Two major models • Register-to-register model → Keep all values that can legally be stored in a register in registers → Ignore machine limitations on number of registers →
View Full Document