DOC PREVIEW
Princeton COS 226 - REGULAR EXPRESSIONS

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Algorithms, 4th Edition·Robert Sedgewick and Kevin Wayne·Copyright © 2002–2011·April 23, 2012 5:10:27 AMAlgorithmsFOUR T H EDIT IONR O B E R T S EDG EWICK K EVIN W A Y N E5.4 REGULAR EXPRESSIONS‣regular expressions‣REs and NFAs‣NFA simulation‣NFA construction‣applications2‣regular expressions‣NFAs‣NFA simulation‣NFA construction‣applications3Pattern matchingSubstring search. Find a single string in text.Pattern matching. Find one of a specified set of strings in text.Ex. [genomics] •Fragile X syndrome is a common cause of mental retardation.•Human genome contains triplet repeats of CGG or AGG,bracketed by GCG at the beginning and CTG at the end.•Number of repeats is variable, and correlated with syndrome.patterntextGCG(CGG|AGG)*CTGGCGGCGTGTGTGCGAGAGAGTGGGTTTAAAGCTGGCGCGGAGGCGGCTGGCGCGGAGGCTG4Pattern matching: applicationsTest if a string matches some pattern.•Process natural language.•Scan for virus signatures.•Specify a programming language.•Access information in digital libraries. •Search genome using PROSITE patterns.•Filter text (spam, NetNanny, Carnivore, malware).•Validate data-entry fields (dates, email, URL, credit card)....Parse text files.•Compile a Java program.•Crawl and index the Web.•Read in data stored in ad hoc input file format.•Create Java documentation from Javadoc comments....5Regular expressionsA regular expression is a notation to specify a (possibly infinite) set of strings.a “language”operationexample REmatchesdoes not matchconcatenationAABAABAABAABevery other stringorAA | BAABAABAABevery other stringclosureAB*AAAABBBBBBBBAABABABAparenthesesA(A|B)AABAAAABABAABevery other stringparentheses(AB)*AAABABABABABAAAABBA6Regular expression shortcutsAdditional operations are often added for convenience.Ex. [A-E]+ is shorthand for (A|B|C|D|E)(A|B|C|D|E)*operationexample REmatchesdoes not matchwildcard.U.U.U.CUMULUSJUGULUMSUCCUBUSTUMULTUOUScharacter class[A-Za-z][a-z]*wordCapitalizedcamelCase4illegalat least 1A(BC)+DEABCDEABCBCDEADEBCDEexactly k[0-9]{5}-[0-9]{4}08540-132119072-5541111111111166-54-111complement[^AEIOU]{6}RHYTHMDECADE7Regular expression examplesNotation is surprisingly expressiveand plays a well-understood role in the theory of computation.regular expressionmatchesdoes not match.*SPB.*(substring search)RASPBERRYCRISPBREADSUBSPACESUBSPECIES[0-9]{3}-[0-9]{2}-[0-9]{4}(Social Security numbers)166-11-4433166-45-111111-555555558675309[a-z]+@([a-z]+\.)+(edu|com)(email addresses)[email protected]@princeton.eduspam@nowhere[$_A-Za-z][$_A-Za-z0-9]*(Java identifiers)ident3PatternMatcher3aident#38Illegally screening a job candidate[First name of a candidate]! and pre/2 [last name of a candidate] w/7 bush or gore or republican! or democrat! or charg! or accus! or criticiz! or blam! or defend! or iran contra or clinton or spotted owl or florida recount or sex! or controvers! or racis! or fraud! or investigat! or bankrupt! or layoff! or downsiz! or PNTR or NAFTA or outsourc! or indict! or enron or kerry or iraq or wmd! or arrest! or intox! or fired or sex! or racis! or intox! or slur! or arrest! or fired or controvers! or abortion! or gay! or homosexual! or gun! or firearm!Nexis search string used by Monica Goodling in DOJ to vet potential Bush administration appointees“ [First name]! and pre/2 [last name] w/7 bush or gore or republican! or democrat! or charg! or accus! or criticiz! or blam! or defend! or iran contra or clinton or spotted owl or florida recount or sex! or controvers! or racis! or fraud! or investigat! or bankrupt! or layoff! or downsiz! or PNTR or NAFTA or outsourc! or indict! or enron or kerry or iraq or wmd! or arrest! or intox! or fired or sex! or racis! or intox! or slur! or arrest! or fired or controvers! or abortion! or gay! or homosexual! or gun! or firearm! ” — LexisNexis search string used by Monica Goodling to illegally screen candidates for DOJ positionshttp://www.justice.gov/oig/special/s0807/final.pdf9Can the average web surfer learn to use REs?Google. Supports * for full word wildcard and | for union.10Regular expressions to the rescuehttp://xkcd.com/20811Can the average programmer learn to use REs?Perl RE for valid RFC822 email addresses(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[


View Full Document

Princeton COS 226 - REGULAR EXPRESSIONS

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download REGULAR EXPRESSIONS
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view REGULAR EXPRESSIONS and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view REGULAR EXPRESSIONS 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?