CSc 453 — Compilers and Systems Software13 : Intermediate Code IChristian CollbergDepartment of Computer ScienceUniversity of [email protected]° 2009 Christian CollbergOctober 18, 20091Introduction2 Compiler PhasesWe are here!ASTasmSemanticAnalyserInterm.Code GenIRMachineCode GenIRASTLexertokenssourceerrorsParsererrorsOptimizeerrors3 Intermediate Representations• Some compilers use the AST as the only intermediate representation. Optimizations (code improve-ments) are performed directly on the AST, and machine code is generated directly from the AST.• The AST is OK for machine-independent optimizations, such as inlining (replacing a procedure callwith the called procedure’s code).• The AST is a bit too high-level for machine code generation and machine-dependent optimizations.14 Intermediate Representations. . .• For this reason, some compilers generate a lower level (simpler, closer to machine code) representationfrom the AST. This representation is used during code generation and code optimization.5 Intermediate CodeAdvantages of:1. Fitting many front-ends to many back-ends,2. Different development teams for front- and back-end,3. Debugging is simplified,4. Portable optimization.Requirements:1. Architecture independent,2. Language independent,3. Easy to generate,4. Easy to optimize,5. Easy to produce machine code from.6 Mix-and-Match CompilersINTERMEDIATE REPRESENTATIONPascalPascal68k−compilerAdaMips−compiler Mips−compilerENDRONTFAda Pascal Modula−2 C++BACKSparcMips68000IBM/370END7 UNCOL• A representation which is both architecture and language independent is known as an UNCOL, aUniversal Compiler Oriented Language.• UNCOL is the holy grail of compiler design – many have search for it, but no-one has found it.Problems:1. Programming language semantics differ from one language to another,2. Machine architectures differ.28 Intermediate Code. . .• There are several different types of intermediate representations:1. Tree-Based.2. Graph-Based.3. Tuple-Based.4. Linear representations.• All representations contain the same information. Some are easier to generate, some are easy togenerate simple machine code from, some are easy to generate good code from.• IR — Intermediate Representation.9Postfix Notation10 Postfix Notation*assignba 2 a 2+*Infix: b := (a ∗ 2) + (a ∗ 2)Postfix: b a 2 * a 2 * + :=• Postfix notation is a parenthesis free notation for arithmetic expression. It is essentially a linearizedrepresentation of an abstract syntax tree.• In postfix notation an operator appears after its operands.• Very simple to generate, very compact, easy to generate straight-forward machine code from, difficultto generate good machine code from.11Tree & DAG Representations12 Tree & DAG Representation• Trees make good intermediate representations. We can represent the program as a sequence of ex-pression trees. Each assignment, procedure call, or jump becomes one individual tree in the forest.• Common Subexpression Elimination (CSE): Even if the same (sub-) expression appears more thanonce in a procedure, we should only compute its value once, and save the result for future reference.• One way of doing this is to build a graph representation, rather than a tree. In the following slides wesee how the expression a ∗ 2 gets two subtrees in the tree representation and one subtree in the DAGrepresentation.313 Tree & DAG Representation. . .b := (a ∗ 2) + (a ∗ 2)*assignba 2 a 2+*Linearized TreeNrOp Arg1Arg21 ident a2 int 23 mul 1 24 ident a5 int 26 mul 4 57 add 3 68 ident b9 assign 8 714 Tree & DAG Representation. . .b := (a ∗ 2) + (a ∗ 2)*assignb +a 2Linearized DagNrOp Arg1Arg21 ident a2 int 23 mul 1 24 add 3 35 ident b6 assign 5 415 Sequence of Expression TreesX := 20;WHILE X < 10 DOX := X-1;A[X] := 10;IF X = 4 THENX := X - 2;ENDIF;ENDDO;Y := X + 5;XX:=20gotoB3:=XY+5gotoEXIT>=10B4X:=X−1X[]A X:=10X4B6<>:=X−2gotoB2EXIT:B4:B3:B5:B6:B2:16 Building DAGs. . .• Repeatedly add subtrees to build DAG. Only add subtrees not already in DAG. Store subtrees in ahash table. This is the value-number algorithm.4•For every insertionof a subtree, check if(X OP Y ) ∈ DAG.X YOPPROCEDURE InsertNode (OP : Operator; L, R : Node) : Node;BEGINV := hashfunc (OP, L, R);N := HashTab.Lookup (V , OP, L, R);IF N = NullNode THENN := NewNode (OP, L, R);HashTab.Insert (V , N );END;RETURN N ;END InsertNode;17 Building DAGs – Example34625A+BAB+AB134625A+BAB(A+B)*CC+ABC*134625+ABC*+(A+B)*C+(A+B)A+BAB(A+B)*CC118 Building DAGs• From an expression/expression tree such as the one on the left we might generate the machine code(for some fictitious architecture) on the right:a ∗ (b + c)41235+a*cbLOAD b, r0LOAD c, r1ADD r0, r1, r2LOAD a, r3MUL r2, r3, r4• Can we generate better code from a DAG than a tree?19 Building DAGs. . .Example Expression:[(a + b) ∗ c + {(a + b) + e} ∗ (e + f)] ∗ [(a + b) ∗ c]5Tree Representation:++a bc*+e f+a b+e*+a bc**20 Building DAGs. . .[(a + b) ∗ c + {(a + b) + e} ∗ (e + f)] ∗ [(a + b) ∗ c]DAG Representation:e*++a bc* *+ +f21 Building DAGs. . .• Generating machine code from the tree yields 21 instructions.Code from TreeLOAD a, r0 ; aLOAD b, r1 ; bADD r0, r1, r2 ; a + bLOAD c, r0 ; cMUL r0, r2, r3 ; (a + b) ∗ cLOAD a, r0 ; aLOAD b, r1 ; bADD r0, r1, r2 ; a + bLOAD e, r0 ; eADD r2, r0, r4 ; (a + b) + eLOAD f, r0 ; fLOAD e, r1 ; eADD r0, r1, r0 ; f + eMUL r4, r0, r4 ; {(a + b) + e}∗; (e + f)ADD r3, r4, r4; Code for (a + b) ∗ c into r3MUL r4, r3, r022 Building DAGs. . .• Generating machine code from the DAG yields only 12 instructions.Code from DAGLOAD a, r0 ; aLOAD b, r1 ; bADD r0, r1, r2; a + bLOAD c, r0 ; cMUL r0, r2, r3 ; (a + b) ∗ cLOAD e, r4 ; eADD r4, r2, r1 ; (a + b) + eLOAD f, r0 ; fADD r0, r4, r0 ; f + eMUL r1, r0, r0ADD r0, r3, r0MUL r0, r3, r0623Tuple Codes24 Three-Address Code• Another common representation is three-address code. It is akin to assembly code, but uses aninfinite number of temporaries (registers) to store the results of operations.• There are three common realizations of three-address code: quadruples, triples and indirect triples.Types of 3-Addr Statements:x := y op z Binary arithmetic or logical operation. Example: Mul, And.x := op yUnary arithmetic, conversion, or logical operation. Example: Abs, UnaryMinus, Float.x := yCopy statement.25 Three-Address Code. . .goto LUnconditional jump.if x relop y goto LConditional jump. relop is one of <,>,<=, etc. If x relop
View Full Document