CS 11 Ocaml track: lecture 6Problem (1)Problem (2)Problem (3)Problem (4)Interpreters and compilers (1)Interpreters and compilers (2)Building a compilerStages of a compilerStages of a typical interpreterStages of a Scheme interpreterStages of a simple interpreterLexer (1)Lexer (2)Lexer (3)Lexer (4)Regular expressions (1)Regular expressions (2)Regular expressions (3)Regular expressions (4)Lexer generators (1)Lexer generators (2)Lexer generators (3)Parsing Scheme (1)Parsing Scheme (2)S-expressions (1)S-expressions (2)AtomsS-expressions (3)S-expressions (4)S-expressions in Ocaml (1)S-expressions in Ocaml (2)S-expr version of our programParser generators (1)Parser generators (2)Parser generators (3)Context-free grammars (1)Context-free grammars (2)Context-free grammars (3)Abstract Syntax Trees (1)Abstract Syntax Trees (2)Abstract Syntax Trees (3)Abstract Syntax Trees (4)Abstract Syntax Trees (5)Abstract Syntax Trees (6)Abstract Syntax Trees (7)Abstract Syntax Trees (8)AST version of our programExample -- calculator languageparser.mly file, part 1Slide 51Slide 52Slide 53Slide 54parser.mly file, part 2Slide 56Slide 57Slide 58Slide 59Slide 60How parsing worksShifting and reducing (1)Shifting and reducing (2)Parsing actionsExample: 2 + 2 = 4Notelexer.mll, part 1lexer.mll, part 2Slide 69Slide 70Strategy for writing lab 5 (1)Strategy for writing lab 5 (2)CS 11 Ocaml track: lecture 6Today:Writing a computer languageParser generatorslexers (ocamllex)parsers (ocamlyacc)Abstract syntax treesProblem (1)We want to implement a computer languageVery complex problemhowever, much similarity between implementations of different languagesWe can take advantage of thisProblem (2)We will implement a (very, very) simplified version of the Scheme programming languagehopefully familiar from CS 1We call our version of Scheme "bogoscheme"truth in advertisingfile names end in .bsProblem (3)Here is the Scheme program we want to be able to run:(define factorial (lambda (n) (if (= n 0) 1 (* n (factorial (- n 1))))))(print (factorial 10))Result should be 3628800Problem (4)Two basic ways to implement languages:interpretercompilerDifference?Interpreters and compilers (1)A compiler takes a program (as a file or files of text) and generates a machine-language executable file (or files) from itAn interpreter takes a program (as a file or files of text), converts it to some internal representation, and executes it immediatelyInterpreters and compilers (2)Some language processors are intermediate between interpreters and compilerse.g. JavaCompiler converts source code to Java bytecodeInterpreter interprets bytecodeJIT compiler can compile all the way to machine language "on the fly"Building a compilerCompilers have many stagesStart with source codeusually in text filesGo through a number of transformationsEventually output machine codewhich can be executed directlyStages of a compilerTypical compiler stages:Lexer: converts input file (considered as a string) into a sequence of tokensParser: converts sequence of tokens into an "abstract syntax tree" (AST)[many other stages]Code generator: emits machine language codeStages of a typical interpreterTypical interpreter stages:Lexer: converts input file (considered as a string) into a sequence of tokensParser: converts sequence of tokens into an "abstract syntax tree" (AST)Evaluator: evaluates AST directlyStages of a Scheme interpreterSimilar to typical interpreter, with one extra stageLexer: converts input file (considered as a string) into a sequence of tokensParser: converts sequence of tokens into sequence of "S-expressions"Convert S-expressions into ASTEvaluator: evaluates AST directlyThis is subject of labs 5 and 6Stages of a simple interpreterLexer: converts input file (considered as a string) into a sequence of tokensParser: recognizes sequences of tokens and executes them directlyOnly useful for very simple languages e.g. calculatorsWill show an example laterLexer (1)A "lexer" ("lexical analyzer") is the first stage of interpretation or compilationTakes raw source code and recognizes syntactically meaningful "tokens"Also throws out unnecessary stuffcommentswhitespaceLexer (2)What is a token?Simple literal data valuesintegers, booleans, etc.Punctuatione.g. left, right parenthesesIdentifiersnames of functions, variablesKeywords, operators (if any)we won't need thisLexer (3)Input:(define x 10) ; set x to 10Possible output of lexerTOK_LPAREN, TOK_ID("define"), TOK_ID("x"),TOK_INT(10), TOK_RPARENNOTE: whitespace and comments are thrown awaySequence of tokens will be handed off to parserLexer (4)How do we recognize tokens?Could write long, boring string-recognizing program by handMore modern approach: use regular expressions which match tokens of interestEach token type gets its own regular expression (also called a regexp)Regular expressions (1)Regexps are a way to identify a class of stringsSimplest regexps:"foo" recognizes literal string "foo" onlyOr fixed characters:'(' recognizes left parenthesis onlyOr an arbitrary character:_ matches any single characterOr the end-of-file character:EOF matches end-of-file characterRegular expressions (2)Regexps can also match multiples of other regexps:regexp * matches zero or more occurrences of regexp"foo" * matches "", "foo", "foofoo", ...regexp + matches one or more occurrences of regexpregexp? matches zero or one occurrence of regexpRegular expressions (3)Regexps can also match a sequence of smaller regexps:regexp1 regexp2 matches regexp1 followed by regexp2Can also match any character in a set:['a' 'b' 'c'] match any of 'a' 'b' 'c'['a' - 'z'] match any char from 'a' to 'z'[^'a' 'b' 'c'] match any char except 'a', 'b', 'c'Regular expressions (4)"Or patterns":regexp1 | regexp2 matches either regexp1 or regexp2Some other regexp patterns as wellSee ocamllex manual for full listNOTE: regexp syntax varies between language implementationsthough you only need to know Ocaml versionLexer generators (1)Writing lexers by hand is boringAlso easily automatedModern approach: describe lexer in a high-level specification in a special fileuse a special program to convert this lexer into code for the language
View Full Document