Unformatted text preview:

Untitled6.035 Project 3: Unoptimized Code Generation Jason Ansel MIT - CSAILQuiz Monday ● 50 minute quiz Monday ● Covers everything up to yesterdays lecture ● Lexical Analysis (REs, DFAs, NFAs) ● Syntax Analysis (CFGs, top-down parsing, buttom-up parsing) ● Semantics Analysis (type checking, type systems, attribute grammars) ● Questions similar to miniquizs, but a bit harderProject 3 Roadmap ● Design and Checkpoint – Due Monday March 10th – Checkpoint ● Document of your proposed design (email me) ● Create a tarball of what you have ● If you get codegen to work, no effect ● If you have problems at end, we will be very harsh if you haven’tdone much work by the checkpoint ● Group meeting – not mandatory, meet with me if you want. ● Final Implementation and Report – Due on March 16thCourse Machines ● Meet tyner.csail.mit.edu and silver.csail.mit.edu! ● Two AMD64 machines – dual processor – dual core per processor – 8 gigs of RAM ● Use them for running your compiled assembly code. – User: le0X, password in le0X-pass in group dir ● Can access files over ssh: – git clone athena.dialup.mit.edu:/mit/6.035/group/... Athena is MIT's UNIX-based computing environment. OCW does not provide access to it.Unoptimized Code Generation ● Translate all the instructions in the intermediate representation to assembly language – Allocate space for the variables. ● Globals ● Arrays – Adhere to calling conventions – Short circuiting – Runtime checksLow-level IR design choices ● Classes to use – Same has high IR? (with restrictions) – Newly added classes? – Mix? ● Level of the low IR – How close to assembly? ● Alternate representations? – Single Static Assignment – Infinite register machine (DirectX, etc) – Stack-based machine (Java bytecode, etc)Math ops ●● High Level: ● Temporaries: In place: a=1*2+3*4 t1 = 1 * 2 t1 = 2 (movq) b=a*a+1 t2 = 3 * 4 t1*= 1 (imul) t2 = 4 (movq) a = t1 + t2 t2*= 3 (imul) t3 = a * a a = t2 (mov) b = t3 + 1 a+= t1 (add) t3 = a (mov) t3*= a (imul) b = 1 (movq) b+=t3 (add)Variables / Temporaries ● Names (input) become... ● Descriptors (high IR) ● Intermediate allocation – Everything on the stack? ● Later optimize by moving to registers – Everything in a register? ● “Spill” excess to the stack – Other techniques... ● Final allocation (fixed registers + stack) ● Register allocation is hard! – Start simpleControl flow ● Must eventually become labels and jumps if (a) { foo } else { bar } ● Becomes: cmp $0, a jne l1 bar jmp l2 l1: foo l2:x86-64 (AMD64) ● Stack values are 64-bit (8-byte) ● Values in decaf are 32-bit or 1-bit ● For this phase, we are not optimizing for space ● Use 64-bits (quadword) for ints and bools. ● Use instructions that operate on 64-bit values for stack and mem operations, e.g. mov ● Arithmetic instructions have 32-bit operands, add,sub, etcRegisters Register Purpose Saved across calls %rax temp register; return value No %rbx callee-saved Yes %rcx used to pass 4th argument to functions No %rdx used to pass 3rd argument to function No %rsp stack pointer Yes %rbp callee-saved; base pointer Yes %rsi used to pass 2nd argument to function No %rdi used to pass 1st argument to functions No %r8 used to pass 5th argument to functions No %r9 used to pass 6th argument to functions No %r10-r11 temporary No %r12-r15 callee-saved registers YesASM Instructions ● Check out the x86-64 Architecture guide. ● Remember that we are using AT&T assembler syntax (gcc) ● Usually, operator op1 op2 – op2 = op1 operator op2 ● $x denotes immediate integer (base 10) value x ● %r?? is a register ● You can use names of globals directlyASM Instructions ● Some caveats: – mov instructions sometimes need a suffix if the assembler cannot resolve the data size – For example when you move an immediate into memory: movq $1, -8(%rbp)Registers ● Instructions allow only limited memoryoperations – add -4(%rbp), -8(%rbp) – mov -4(%rbp), %r10 add %r10, -8(%rbp) ● Important for performance – limited in number ● Special registers – %rbp base pointer – %rsp stack pointer Memory Registers ALU ControlAllocating Read-Only Data ● All Read-Only data in the textsegment ● Integers – use immediates ● Strings – use the .string macro .section .rodata .msg: .string "Five: %d\n" .section .text .globl main main: enter $0, $0 mov $.msg, %rdi mov $5, %rsi mov $0, %rax call printf leave retGlobal Variables ● Allocation: Use the assembler's .comm directive ● Use name or ● Use PC relative addressing – %rip is the current instructionaddress – X(%rip) will add the offset from the current instruction location to the space for x in the datasegment to %rip – Creates easily re-locatablebinaries … .section .text .globl mainmain: enter $0, $0 mov mov mov call leave $.msg, %rdi x, %rsi $0, %rax printf ret .comm x, 8, 8 .comm name, size, alignment The .comm directive allocates storage in thedata section. The storage is referenced by theidentifier name. Size is measured in bytes andmust be a positive integer. Name cannot be predefined. Alignment is optional. If alignment is specified, the address of name is aligned to a multiple of alignmentGlobal Variables ● Allocation: Use the assembler's .comm directive ● Use name or ● Use PC relative addressing – %rip is the current instructionaddress – X(%rip) will add the offset from the current instruction location to the space for x in the datasegment to %rip – Creates easily re-locatablebinaries … .section .text .globl mainmain: enter $0, $0 mov mov mov call leave $.msg, %rdi x(%rip), %rsi $0, %rax printf ret .comm x, 8, 8 .comm name, size, alignment The .comm directive allocates storage in thedata section. The storage is referenced by theidentifier name. Size is measured in bytes andmust be a positive integer. Name cannot be predefined. Alignment is optional. If alignment is specified, the address of name is aligned to a multiple of alignmentAddressing Modes ● (%reg) is the memory location pointed to by the value in %reg ● movq $5, -8(%rbp)What about Arrays ● What code would you write for? ex: a[4] = 5; … mov $5, %r10 mov $4, %r11 ??? … .comm a, 8 * 10, 8Array Addressing ● The data segment grows toward larger addresses. ● How to access an array element? ● We want something like – base + offset * type_size


View Full Document

MIT 6 035 - Unoptimized Code Generation

Download Unoptimized Code Generation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Unoptimized Code Generation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Unoptimized Code Generation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?