Untitled6.035 Project 3: Unoptimized Code Generation Jason Ansel MIT - CSAILQuiz Monday ● 50 minute quiz Monday ● Covers everything up to yesterdays lecture ● Lexical Analysis (REs, DFAs, NFAs) ● Syntax Analysis (CFGs, top-down parsing, buttom-up parsing) ● Semantics Analysis (type checking, type systems, attribute grammars) ● Questions similar to miniquizs, but a bit harderProject 3 Roadmap ● Design and Checkpoint – Due Monday March 10th – Checkpoint ● Document of your proposed design (email me) ● Create a tarball of what you have ● If you get codegen to work, no effect ● If you have problems at end, we will be very harsh if you haven’tdone much work by the checkpoint ● Group meeting – not mandatory, meet with me if you want. ● Final Implementation and Report – Due on March 16thCourse Machines ● Meet tyner.csail.mit.edu and silver.csail.mit.edu! ● Two AMD64 machines – dual processor – dual core per processor – 8 gigs of RAM ● Use them for running your compiled assembly code. – User: le0X, password in le0X-pass in group dir ● Can access files over ssh: – git clone athena.dialup.mit.edu:/mit/6.035/group/... Athena is MIT's UNIX-based computing environment. OCW does not provide access to it.Unoptimized Code Generation ● Translate all the instructions in the intermediate representation to assembly language – Allocate space for the variables. ● Globals ● Arrays – Adhere to calling conventions – Short circuiting – Runtime checksLow-level IR design choices ● Classes to use – Same has high IR? (with restrictions) – Newly added classes? – Mix? ● Level of the low IR – How close to assembly? ● Alternate representations? – Single Static Assignment – Infinite register machine (DirectX, etc) – Stack-based machine (Java bytecode, etc)Math ops ●● High Level: ● Temporaries: In place: a=1*2+3*4 t1 = 1 * 2 t1 = 2 (movq) b=a*a+1 t2 = 3 * 4 t1*= 1 (imul) t2 = 4 (movq) a = t1 + t2 t2*= 3 (imul) t3 = a * a a = t2 (mov) b = t3 + 1 a+= t1 (add) t3 = a (mov) t3*= a (imul) b = 1 (movq) b+=t3 (add)Variables / Temporaries ● Names (input) become... ● Descriptors (high IR) ● Intermediate allocation – Everything on the stack? ● Later optimize by moving to registers – Everything in a register? ● “Spill” excess to the stack – Other techniques... ● Final allocation (fixed registers + stack) ● Register allocation is hard! – Start simpleControl flow ● Must eventually become labels and jumps if (a) { foo } else { bar } ● Becomes: cmp $0, a jne l1 bar jmp l2 l1: foo l2:x86-64 (AMD64) ● Stack values are 64-bit (8-byte) ● Values in decaf are 32-bit or 1-bit ● For this phase, we are not optimizing for space ● Use 64-bits (quadword) for ints and bools. ● Use instructions that operate on 64-bit values for stack and mem operations, e.g. mov ● Arithmetic instructions have 32-bit operands, add,sub, etcRegisters Register Purpose Saved across calls %rax temp register; return value No %rbx callee-saved Yes %rcx used to pass 4th argument to functions No %rdx used to pass 3rd argument to function No %rsp stack pointer Yes %rbp callee-saved; base pointer Yes %rsi used to pass 2nd argument to function No %rdi used to pass 1st argument to functions No %r8 used to pass 5th argument to functions No %r9 used to pass 6th argument to functions No %r10-r11 temporary No %r12-r15 callee-saved registers YesASM Instructions ● Check out the x86-64 Architecture guide. ● Remember that we are using AT&T assembler syntax (gcc) ● Usually, operator op1 op2 – op2 = op1 operator op2 ● $x denotes immediate integer (base 10) value x ● %r?? is a register ● You can use names of globals directlyASM Instructions ● Some caveats: – mov instructions sometimes need a suffix if the assembler cannot resolve the data size – For example when you move an immediate into memory: movq $1, -8(%rbp)Registers ● Instructions allow only limited memoryoperations – add -4(%rbp), -8(%rbp) – mov -4(%rbp), %r10 add %r10, -8(%rbp) ● Important for performance – limited in number ● Special registers – %rbp base pointer – %rsp stack pointer Memory Registers ALU ControlAllocating Read-Only Data ● All Read-Only data in the textsegment ● Integers – use immediates ● Strings – use the .string macro .section .rodata .msg: .string "Five: %d\n" .section .text .globl main main: enter $0, $0 mov $.msg, %rdi mov $5, %rsi mov $0, %rax call printf leave retGlobal Variables ● Allocation: Use the assembler's .comm directive ● Use name or ● Use PC relative addressing – %rip is the current instructionaddress – X(%rip) will add the offset from the current instruction location to the space for x in the datasegment to %rip – Creates easily re-locatablebinaries … .section .text .globl mainmain: enter $0, $0 mov mov mov call leave $.msg, %rdi x, %rsi $0, %rax printf ret .comm x, 8, 8 .comm name, size, alignment The .comm directive allocates storage in thedata section. The storage is referenced by theidentifier name. Size is measured in bytes andmust be a positive integer. Name cannot be predefined. Alignment is optional. If alignment is specified, the address of name is aligned to a multiple of alignmentGlobal Variables ● Allocation: Use the assembler's .comm directive ● Use name or ● Use PC relative addressing – %rip is the current instructionaddress – X(%rip) will add the offset from the current instruction location to the space for x in the datasegment to %rip – Creates easily re-locatablebinaries … .section .text .globl mainmain: enter $0, $0 mov mov mov call leave $.msg, %rdi x(%rip), %rsi $0, %rax printf ret .comm x, 8, 8 .comm name, size, alignment The .comm directive allocates storage in thedata section. The storage is referenced by theidentifier name. Size is measured in bytes andmust be a positive integer. Name cannot be predefined. Alignment is optional. If alignment is specified, the address of name is aligned to a multiple of alignmentAddressing Modes ● (%reg) is the memory location pointed to by the value in %reg ● movq $5, -8(%rbp)What about Arrays ● What code would you write for? ex: a[4] = 5; … mov $5, %r10 mov $4, %r11 ??? … .comm a, 8 * 10, 8Array Addressing ● The data segment grows toward larger addresses. ● How to access an array element? ● We want something like – base + offset * type_size
View Full Document