CS 11 Ocaml track: lecture 6Problem (1)Problem (2)Problem (3)Problem (4)Interpreters and compilers (1)Interpreters and compilers (2)Building a compilerStages of a compilerStages of a typical interpreterStages of a Scheme interpreterStages of a simple interpreterLexer (1)Lexer (2)Lexer (3)Lexer (4)Regular expressions (1)Regular expressions (2)Regular expressions (3)Regular expressions (4)Lexer generators (1)Lexer generators (2)Lexer generators (3)Parsing Scheme (1)Parsing Scheme (2)S-expressions (1)S-expressions (2)AtomsS-expressions (3)S-expressions (4)S-expressions in Ocaml (1)S-expressions in Ocaml (2)S-expr version of our programParser generators (1)Parser generators (2)Parser generators (3)Context-free grammars (1)Context-free grammars (2)Context-free grammars (3)Abstract Syntax Trees (1)Abstract Syntax Trees (2)Abstract Syntax Trees (3)Abstract Syntax Trees (4)Abstract Syntax Trees (5)Abstract Syntax Trees (6)Abstract Syntax Trees (7)Abstract Syntax Trees (8)AST version of our programExample -- calculator languageparser.mly file, part 1parser.mly file, part 1parser.mly file, part 1parser.mly file, part 1parser.mly file, part 1parser.mly file, part 2parser.mly file, part 2parser.mly file, part 2parser.mly file, part 2parser.mly file, part 2parser.mly file, part 2How parsing worksShifting and reducing (1)Shifting and reducing (2)Parsing actionsExample: 2 + 2 = 4Notelexer.mll, part 1lexer.mll, part 2lexer.mll, part 2lexer.mll, part 2Strategy for writing lab 5 (1)Strategy for writing lab 5 (2)CS 11 Ocaml track: lecture 6 Today: Writing a computer language Parser generators lexers (ocamllex) parsers (ocamlyacc) Abstract syntax treesProblem (1) We want to implement a computer language Very complex problem however, much similarity between implementations of different languages We can take advantage of thisProblem (2) We will implement a (very, very) simplified version of the Scheme programming language hopefully familiar from CS 1 We call our version of Scheme "bogoscheme" truth in advertising file names end in .bsProblem (3) Here is the Scheme program we want to be able to run:(define factorial(lambda (n)(if (= n 0)1(* n (factorial (- n 1))))))(print (factorial 10)) Result should be 3628800Problem (4) Two basic ways to implement languages: interpreter compiler Difference?Interpreters and compilers (1) A compiler takes a program (as a file or files of text) and generates a machine-language executable file (or files) from it An interpreter takes a program (as a file or files of text), converts it to some internal representation, and executes it immediatelyInterpreters and compilers (2) Some language processors are intermediate between interpreters and compilerse.g.Java Compiler converts source code to Java bytecode Interpreter interprets bytecode JIT compiler can compile all the way to machine language "on the fly"Building a compiler Compilers have many stages Start with source code usually in text files Go through a number of transformations Eventually output machine code which can be executed directlyStages of a compiler Typical compiler stages: Lexer: converts input file (considered as a string) into a sequence of tokens Parser: converts sequence of tokens into an "abstract syntax tree" (AST) [many other stages] Code generator: emits machine language codeStages of a typical interpreter Typical interpreter stages: Lexer: converts input file (considered as a string) into a sequence of tokens Parser: converts sequence of tokens into an "abstract syntax tree" (AST) Evaluator: evaluates AST directlyStages of a Scheme interpreter Similar to typical interpreter, with one extra stage Lexer: converts input file (considered as a string) into a sequence of tokens Parser: converts sequence of tokens into sequence of "S-expressions" Convert S-expressions into AST Evaluator: evaluates AST directly This is subject of labs 5 and 6Stages of a simple interpreter Lexer: converts input file (considered as a string) into a sequence of tokens Parser: recognizes sequences of tokens and executes them directly Only useful for very simple languages e.g.calculators Will show an example laterLexer (1) A "lexer" ("lexical analyzer") is the first stage of interpretation or compilation Takes raw source code and recognizes syntactically meaningful "tokens" Also throws out unnecessary stuff comments whitespaceLexer (2) What is a token? Simple literal data values integers, booleans, etc. Punctuatione.g.left, right parentheses Identifiers names of functions, variables Keywords, operators (if any) we won't need thisLexer (3) Input:(define x 10) ; set x to 10 Possible output of lexerTOK_LPAREN, TOK_ID("define"), TOK_ID("x"),TOK_INT(10), TOK_RPAREN NOTE: whitespace and comments are thrown away Sequence of tokens will be handed off to parserLexer (4) How do we recognize tokens? Could write long, boring string-recognizing program by hand More modern approach: use regular expressionswhich match tokens of interest Each token type gets its own regular expression (also called a regexp)Regular expressions (1) Regexps are a way to identify a class of strings Simplest regexps:"foo" Æ recognizes literal string "foo" only Or fixed characters:'(' Æ recognizes left parenthesis only Or an arbitrary character:_ Æ matches any single character Or the end-of-file character:EOF Æ matches end-of-file characterRegular expressions (2) Regexps can also match multiples of other regexps:regexp * Æ matches zero or more occurrences of regexp"foo" * Æ matches "", "foo", "foofoo", ...regexp + Æ matches one or more occurrences of regexpregexp? Æ matches zero or one occurrence of regexpRegular expressions (3) Regexps can also match a sequence of smaller regexps:regexp1 regexp2 Æ matches regexp1 followed by regexp2 Can also match any character in a set:['a' 'b' 'c'] Æ match any of 'a' 'b' 'c'['a' - 'z'] Æ match any char from 'a' to 'z'[^'a' 'b' 'c'] Æ match any char except 'a', 'b', 'c'Regular expressions (4) "Or patterns":regexp1 | regexp2 Æ matches either regexp1 or regexp2 Some other regexp patterns as well See ocamllex manual for full list NOTE: regexp syntax varies between language implementations though you only need to know Ocaml versionLexer generators (1)
View Full Document