DOC PREVIEW
Princeton COS 226 - Regular Expressions

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Robert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos226Regular ExpressionsReference: Chapter 7.1-7.4, Introduction to Computer Science, R. Sedgewick and K. Wayne.2Pattern MatchingString search. Search for given string in a large text file.Regular expression.! Natural and compact way to express multiple text patterns.! Quintessential programmer's tool.Ex. Fragile X syndrome is a common cause of mental retardation.! Human genome contains triplet repeats of CGG or AGG, starting withGCG and ending with CTG.! Number of repeats is variable, and correlated with syndrome.! Use regular expression to specify pattern: GCG(CGG|AGG)*CTG.3Pattern Matching ApplicationsTest if a string matches some pattern.! Process natural language.! Scan for virus signatures.! Search for information using Google.! Access information in digital libraries.! Retrieve information from Lexis/Nexis.! Search-and-replace in a word processors.! Filter text (spam, NetNanny, Carnivore, malware).! Validate data-entry fields (dates, email, URL, credit card).! Search for markers in human genome using PROSITE patterns.Parse text files.! Compile a Java program.! Crawl and index the Web.! Read in data stored in ad hoc input file format.! Automatically create Java documentation from Javadoc comments.4Regular Expressions: Basic OperationsRegular expression. Notation to specify a set of strings.every other stringaabaabaabaabConcatenationevery other stringaaaababaaba(a|b)aabParentheses(ab)*aab*aaa | baab.u.u.u.Regular ExpressionaaabbbaaababababaabababaaaabbbaClosureUnionWildcardOperationevery other stringaabaabsuccubustumultuouscumulusjugulumNoYes5Regular Expressions: ExamplesRegular expression. Notation is surprisingly expressive.bbbbaabbbaabbbaaabbbaababbaaa* | (a*ba*ba*ba*)*multiple of three b’s111111111403982772100023498701234.*0....fifth to last digit is 0subspacesubspeciesraspberrycrispbread.*spb.*contains the trigraph spbgcgcggcggcggcggctggcgcaggctggcgctggcgcggctggcgcggaggctggcg(cgg|agg)*ctgfragile X syndrome indicatorRegular Expression NoYes6Generalized Regular ExpressionsGeneralized regular expressions.! Additional operations typically added for convenience.! Ex: [a-e]+ is shorthand for (a|b|c|d|e)(a|b|c|d|e)*.111111111166-54-11108540-132119072-5541[0-9]{5}-[0-9]{4}Exactly kdecaderhythm[^aeiou]{6}NegationscamelCase4illegalwordCapitalized[A-Za-z][a-z]*Character classesadebcdeabcdeabcbcdea(bc)+deOne or moreRegular ExpressionOperation NoYes7Regular Expressions in JavaValidity checking. Is input in the set described by the re?public class Validate { public static void main(String[] args) { String re = args[0]; String input = args[1]; System.out.println(input.matches(re)); }}% java Validate "..oo..oo." bloodroottrue% java Validate "[$_A-Za-z][$_A-Za-z0-9]*" ident123true% java Validate "[a-z]+@([a-z]+\.)+(edu|com)" [email protected]% java Validate "[0-9]{3}-[0-9]{2}-[0-9]{4}" 166-11-4433truelegal Java identifiervalid email address (simplified)Social Security numberneed help solving crosswords?8Regular Expressions in Other LanguagesBroadly applicable programmer's tool.! Many languages support extended regular expressions.! Built into grep, awk, emacs, Perl, PHP, Python, JavaScript.PERL. Practical Extraction and Report Language.print all lines containing NEWLINE whichoccurs in any file with a .java extensiongrep NEWLINE */*.javaegrep '^[qwertyuiop]*[zxcvbnm]*$' dict.txt | egrep '...........'replace all occurrences of fromwith to in the file input.txtperl -p -i -e 's|from|to|g' input.txtperl -n -e 'print if /^[A-Za-z][a-z]*$/' dict.txtdo for each line9Regular Expression CaveatsWriting a RE is like writing a program.! Need to learn syntax.! Can be easier to write than read."Sometimes you have a programming problemand it seems like the best solution is to useregular expressions; now you have two problems."10Perl RE for Valid RFC822 Email Addresses(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[


View Full Document

Princeton COS 226 - Regular Expressions

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download Regular Expressions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regular Expressions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regular Expressions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?