DOC PREVIEW
Berkeley COMPSCI 164 - Regular Expressions

This preview shows page 1-2-3-4-5-38-39-40-41-42-43-76-77-78-79-80 out of 80 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 80 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1 Lecture 10 Regular Expressions regular expressions are a small language implemented with automata or backtracking Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction to Programming Languages and Compilers, Spring 2012 UC BerkeleyAnnouncement The CS164 Parser Contest. Part 1: speed of your HW4 parser Prize: fun, fame, and extra credit (for speed of your HW4) Submit your parser for speed evaluation even before deadline. You will receive your current standing in class. Part 2: Speed and code readability of your PA4-6 Prize: fun and fame and a gift certificate. Measured at the end of the semester. Fa10 winner 5x faster than staff parser. Can you beat that? 2Today is a no-laptop Thursday If you are bored, here’s a puzzle: Write a regex that tests whether a number is prime. Hint: \n matches the string matched by the nth regex group Ex: regex c(aa|bb)\1 matches strings caaaa and cbbbb the prime tester must be a regex, not a regular expression! the latter do not support \n 3Where are we in CS164? We have now built enough tools (coroutines, parser) to go after our mission: build small languages so that programmers can become more productive. When building small languages, we program under the abstraction: we implement abstractions for others to use. Today, we’ll learn what’s under the abstraction of regular expressions. 4Next three lectures: small languages What small languages have we already seen? unit calculator, regexes, grammars, queues, jQuery, Prolog well, Prolog is not so small; it’s a rather general-purpose langauge Today: regular expressions the best-studied small language We will look at three questions. How to – integrate regular expressions with a host language – implement regular expressions efficiently – design composable, clear semantics (REs vs. regexes) 5Sample Uses of Regular Expressions and of string processing in generalSample string processing problems Web scraping and rewriting (HW1) also, find and “linkify” mailing addresses Cucumber, a Ruby testing framework with “NLP” When I go to "Accounts" Then I should see link "My Savings" Lexical analysis in interpreters and compilers float x=3.14 --> FLOATTYPE, ID("x"), EQ, FLOATLIT(3.14) Config tools include “file name manipulation” languages ${dir}\foo --> "c:\\MyDir\\foo" Editors, search and replace `quoted text’ D`Souza --> `quoted text’ D’Souza 71. Web scraping and rewriting Rewrite a web page using GreaseMonkey scripting. The idea: web pages are more readable when you view the print-friendly, ad-free pages. Automate this! Approach: Your script will rewrite links on a page to point to the print-friendly version of target page. How: When user clicks on a link, fetch (but do not display) the target page; use a regex to find in the target HTML text the (best-guess) link to the print-friendly page; rewrite the link to point to that page; follow the rewritten link to display the friendly page. 82. Cucumber: a Ruby testing framework A sample Cucumber test file: Scenario: Test the banking web service Given I log in as "bonnie" with password "clyde" When I go to "Accounts" Then I should see a link "Our Robbery Savings" When I follow this link Then I the value of "Interest" should be "$1,024.00" Meaning of this test: "Given" makes the script go to login to the web site. "When" clause clicks on the link Accounts. "Then" clause tests that resulting page contains a link to given account. We could implement a similar tool with GreaseMonkey: regexes can be used to parse the command. 93. Lexical analysis in a compiler/interpreter Input: a program function timedCount() { // my function document.getElementById('txt').value=c; } Output: a sequence of tokens FUNCTION, ID("timedCount"), LPAR, RPAR, LCUR, ID("document"), DOT, ID("getElementById"), LPAR, STRING("txt"), RPAR, DOT, ID("value"), ASGN, ID("c"), SEMI, RCUR REs facilitate concise description of tokens 10Notes on lexical analysis The lexer partitions the input into lexemes Lexemes are mapped to tokens The stream of tokens is fed to the parser Some tokens are associated with their lexemes Whitespace and comments are typically skipped 11Another challenge question The lexical analyzer produces a list of tokens without consulting with the parser i.e., tokenizing is syntax insensitive Q: Give a JavaScript scenario where the tokenizing depends on the context of the parser. i.e., lexer cannot tokenize input w/out feedback from parser 124. File name processing languages In shell scripts and IDEs, command line args can refer to variables such as workspace_loc: This is translated into program arguments: args = { "--inputDir", "c:\\wspace\\MyDirectory", "--outputDir", ".\\temp dir" } Must escape \ and quotes during translation Tricky design. Eclipse designed this substitution language wrong: their escaping rules prevent you from expressing some values that you may want to pass into the program. 135. Search for strings in text editors Imagine you want to search for names containing a ' and correct them. Examples: D'Souza --> D’Souza D`Souza --> D’Souza The challenge: your replacement should not change quotes used in quotations: `quoted text’ Again, this can be solved conveniently with REs 14Useful string processing operations Accept: the whole string match Does the entire string s match a pattern r? Match: a prefix match Does some prefix of s match a pattern r? Search: find a substring Does a substring of s match a pattern r? Tokenize: Lexical analysis Partition s into lexemes, each accepted by a pattern r Extract: as match and search but extract substrings Regex r indicates, with ( ), which substrings to extract Replace: replace substrings found with a new string 15Discovering automata and REs a revisionist historyProgramming model of a small language Design of a small (domain-specific) language starts with the choice of a suitable programming model Why is a new model may be needed? Because procedural abstraction (a procedure-based library) cannot always hide the plumbing (implementation details) For string-based processing, automata and regular expressions are usually the right programming model regexes are hard to hide under a procedure library, although HW2 showed how to do it with coroutines. Let’s use string processing as a case study on how to discover the programming model for small languages 17Let's write a lexer (a.k.a.


View Full Document

Berkeley COMPSCI 164 - Regular Expressions

Documents in this Course
Lecture 8

Lecture 8

40 pages

Load more
Download Regular Expressions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regular Expressions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regular Expressions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?