Language Processors COMS W4115 Prof Stephen A Edwards Fall 2003 Columbia University Department of Computer Science Interpreter Source Program Input Interpreter Output Compiler Source Program Compiler Input Executable Program Output Bytecode Interpreter Source Program Compiler Bytecode Input Bytecode Interpreter Output Just in time Compiler Source Program Compiler Bytecode Just in time Compiler Input Output Machine Code Language Speeds Compared Language C Ocaml SML C SML Common Lisp Scheme Ocaml Java Pike Forth Lua Python Perl Ruby Eiffel Mercury Awk Haskell Lisp Icon Tcl Javascript Scheme Forth Erlang Awk Emacs Lisp Scheme PHP Bash bytecodes Impl gcc ocaml mlton g smlnj cmucl bigloo ocamlb java pike gforth lua python perl ruby se mercury mawk ghc rep icon tcl njs guile bigforth erlang gawk xemacs stalin php bash native code http www bagley org doug shootout JIT Threaded code Separate Compilation foo c bar c C compiler cc foo s bar s printf o fopen o malloc o Assembler as Archiver ar foo o bar o libc a Linker ld foo An Executable Preprocessor Massages the input before the compiler sees it Macro expansion File inclusion Conditional compilation The C Preprocessor include stdio h define min x y x y x y ifdef DEFINE BAZ int baz endif void foo int a 1 int b 2 int c c min a b cc E example c gives extern int printf char many more declarations from stdio h void foo int a 1 int b 2 int c c a b a b Compiling a Simple Program int gcd int a int b while a b if a b a b else b a return a What the Compiler Sees int gcd int a int b while a b if a b a b else b a return a i n t sp g c d i n t sp a sp i n t sp b nl nl sp sp w h i l e sp a sp sp b sp nl sp sp sp sp i f sp a sp sp b sp a sp sp b nl sp sp sp sp e l s e sp b sp sp a nl sp sp nl sp sp r e t u r n sp a nl nl Text file is a sequence of characters Lexical Analysis Gives Tokens int gcd int a int b while a b if a b a b else b a return a int gcd int a int while a b b a b else return a b if b a A stream of tokens Whitespace comments removed a Parsing Gives an AST func int gcd args arg int seq arg a int int gcd int a int b while a b if a b a b else b a return a while b return if a b a a b a b Abstract syntax tree built from parsing rules b a Semantic Analysis Resolves Symbols func int gcd args arg int Symbol Table int a a seq arg int while b return if a b a a b a b int b Types checked references to symbols resolved b a Translation into 3 Address Code L0 sne seq btrue sl seq btrue sub jmp L4 sub L5 jmp L1 ret 1 0 0 3 2 2 a L5 b L0 a a 1 L1 b 3 L4 a b 0 while a b a 0 if a b b a b int gcd int a int b while a b if a b a b else b a return a b a b a Idealized assembly language w infinite registers Generation of 80386 Assembly gcd L8 L5 L3 pushl movl movl movl cmpl je jle subl jmp subl jmp leave ret ebp esp ebp 8 ebp eax 12 ebp edx edx eax L3 L5 edx eax L8 eax edx L8 Save FP Load a from stack Load b from stack while a b if a b a b b a Restore SP BP
View Full Document
Unlocking...