DOC PREVIEW
Berkeley COMPSCI 164 - Lecture 2: Lexical Analysis

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 2: Lexical AnalysisReview: Front End Compiler StructureTokensClassical Regular ExpressionsAbbreviationsExtensionsReview of Sample ProgramsProblemsSome Problem SolutionsLecture 2: Lexical AnalysisAdministrivia• Newsgroup appears to be functioning, now managed by CSUA. Visitnews.csua.berkeley.edu• Lecture page also has readings. Try to read oncebeforelecture.• Log into your class account ASAP (I still have account forms).• Start forming teams:– Choose team name (letters, digits, underscores only, starting withcapital letter)– Email me ([email protected]) name of team, and classlogins of members (also mail changes).• Good time to start learning Python (manuals online).Last modified: Mon Feb 23 14:35:34 2009 CS164: Lecture #2 1Review: Front End Compiler StructureSourcecodeLexicalAnalysisTokensParsingASTSemanticAnalysisDecoratedASTWe are here• A sequence of translations that each:– Filter out errors– Remove or put aside extraneous information– Make data more conveniently accessible.• Strategy: find tools that partially automate this procedure.• For lexical analysis: convert description that uses patterns (ex-tended regular expressions) into program.Last modified: Mon Feb 23 14:35:34 2009 CS164: Lecture #2 2Tokens• Token consists ofsyntactic category(like “noun” or “adjective”) plussemantic information(like a particular name).• Parsing (the “customer”) only needs syntactic category:– “Joe went to the store” and “Harry went to the beach” have samegrammatical structure.• For programming, semantic information might be text of identifieror numeral.• Example from Notes:if(i== j)z = 0; /* No work needed */elsez= 1;=⇒IF, LPAR, ID("i"), EQUALS,ID("j"), RPAR, ID("z"),ASSIGN, INTLIT("0"), SEMI,ELSE, ID("z"), ASSIGN,INTLIT("1"), SEMILast modified: Mon Feb 23 14:35:34 2009 CS164: Lecture #2 3Classical Regular Expressions• Regular expressions denote formal languages, which are sets of strings(of symbols from some alphabet).• Appropriate since internal structure not all that complex yet.• Expression R denotes language L(R):– L(ǫ) = L("") = {""}.– If c is a character, L(c) = {"c"}.– If R1, R2are r.e.s, L(R1R2) = {x1x2|x1∈ L(R1), x2∈ L(R2)}.– L(R1|R2) = L(R1) ∪ L(R2).– L(R∗) = L(ǫ) ∪ L(R) ∪ L(R R) ∪ · · ·.– L((R)) = L(R).• Precedence is ‘*’ (highest), concatenation, union (lowest). Parenthe-ses also provide grouping.Last modified: Mon Feb 23 14:35:34 2009 CS164: Lecture #2 4Abbreviations• Character lists, such as [abcf-mxy] in Java, Perl, or Python.• Negative character lists, such as [^aeiou].• Character classes such as . (dot), \d, \s in Java, Perl, Python.• L(R+) = L(RR∗).• L(R?) = L(ǫ|R).Last modified: Mon Feb 23 14:35:34 2009 CS164: Lecture #2 5Extensions• “Capture” parenthesized expressions:– After m = re.match(r’\s*(\d+)\s*,\s*(\d+)\s’, ’12,34’), havem.group(1) == ’12’, m.group(2) == ’34’.• Lazy vs. greedy quantifiers:– re.match(r’(\d+).*’, ’1234ab’) makes group(1) match ’1234’.– re.match(r’(\d+?).*’, ’1234ab’) makes group(1) match ’1’.• Boundaries:– re.search(r’(^abc|qef)’, L) matches abc only at beginning ofstring, and qef anywhere.– re.search(r’(?m)(^abc|qef)’, L) matches abc only at begin-ning of string or of any line.– re.search(r’rowr(?=baz)’, L) matches an instance of ‘rowr’,but only if ‘baz’ follows (does not match baz).– re.search(r’(?!rowr)baz’, L) matches an instance of ‘baz’, butonly if immediately preceded by ‘rowr’ (does not match rowr).• Non-linear patterns: re.search(r’(\S+),\1’, L) matches a wordfollowed by the same word after a comma.Last modified: Mon Feb 23 14:35:34 2009 CS164: Lecture #2 6Review of Sample ProgramsSL/1 “language”:+ - * / = ; , ( ) < >>= <= -->if def else fi whileidentifiersdecimal numeralsComments start with # and go to end of line.(Review of programs in Chapter 2 of Course Notes.)Last modified: Mon Feb 23 14:35:34 2009 CS164: Lecture #2 7Problems• Decimal numerals in C, Java.• All numerals in C, Java.• Floating-point numerals.• Identifiers in C, Java.• Identifiers in Ada.• Comments in C++, Java.• XHTML markups.• Python bracketing.Last modified: Mon Feb 23 14:35:34 2009 CS164: Lecture #2 8Some Problem Solutions• Decimal numerals in C, Java: 0|[1-9][0-9]*• All numerals in C, Java: [1-9][0-9]+|0[xX][0-9a-fA-F]+|0[0-7]*• Floating-point numerals: (\d+\.\d*|\d*\.\d+)([eE][-+]?\d+)?|[0-9]+[eE][-+• Identifiers in C, Java. (ASCII only, no dollar signs):[a-zA-Z][a-zA-Z 0-9]*• Identifiers in Ada: [a-zA-Z]([a-zA-Z0-9]| [a-zA-Z0-9])*• Comments in C++, Java: //.*|/\*([^*]|\*[^/])*\*+/or, using some extended features: //.*|/\*(.|\n)*?\*/• Python bracketing:Nothing much you can do here, except to noteblanks at the beginnings of lines and to do some programming in theactions.Last modified: Mon Feb 23 14:35:34 2009 CS164: Lecture #2


View Full Document

Berkeley COMPSCI 164 - Lecture 2: Lexical Analysis

Documents in this Course
Lecture 8

Lecture 8

40 pages

Load more
Download Lecture 2: Lexical Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 2: Lexical Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2: Lexical Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?