Unformatted text preview:

1CMSC 330: Organization of Programming LanguagesTheory of Regular ExpressionsCMSC 330 2The Theory Behind r.e.’s• That’s it for the basics of Ruby– If you need other material for your project, come to office hours or check out the documentation• Next up: How do r.e.’s really work?– Mixture of a very practical tool (string matching with r.e.’s) and some nice theory– A great computer science resultCMSC 330 3A Few Questions about Regular Expressions• What does a regular expression represent?– Just a set of strings• What are the basic components of r.e.'s?– E.g., we saw that e+ is the same as ee*• How are r.e.'s implemented?– We’ll see how to build a structure to parse r.e.’sCMSC 330 4Definition: Alphabet• An alphabet is a finite set of symbols– Usually denoted • Example alphabets:– Binary:– Decimal:– Alphanumeric: = {0,1} = {0,1,2,3,4,5,6,7,8,9} = {0-9,a-z,A-Z}CMSC 330 5Definition: String•A string is a finite sequence of symbols from –  is the empty string ("" in Ruby)–|s|is the length of string s•|Hello|= 5, || = 0– Note: Ø is the empty set (with 0 elements); Ø {  }• Example strings:– 0101 ∈  = {0,1} (binary)– 0101 ∈  = decimal– 0101 ∈  = alphanumericCMSC 330 6Definition: Concatenation• Concatenation is indicated by juxtaposition–If s1= super and s2= hero, then s1s2= superhero– Sometimes also written s1·s2– For any string s, we have s = s= s–You can concatenate strings from different alphabets, then the new alphabet is the union of the originals: •If s1= super ∈ 1= {s,u,p,e,r} and s2= hero ∈ 2= {h,e,r,o}, then s1s2= superhero ∈ 3= {e,h,o,p,r,s,u}2CMSC 330 7Definition: Language•A language is a set of strings over an alphabet• Example: The set of phone numbers over the alphabet  = {0, 1, 2, 3, 4, 5, 6, 7, 9, (, ), -}– Give an example element of this language– Are all strings over the alphabet in the language?– Is there a Ruby regular expression for this language?• Example: The set of all strings over – Often written */\(\d{3,3}\) \d{3,3}-\d{4,4}/No(123) 456-7890CMSC 330 8Languages (cont’d)• Example: The set of strings of length 0 over the alphabet  = {a, b, c}– {s | s * and |s| = 0} = {}  Ø• Example: The set of all valid Ruby programs– Is there a Ruby regular expression for this language?• Can r.e.'s represent all possible languages?– The answer turns out to be no!– The languages represented by regular expressions are called, appropriately, the regular languagesNo. Matching brackets so they are balanced is impossible.{ { { } } } or {3}3 or, in general, {n }nCMSC 330 9Operations on Languages• Let  be an alphabet and let L, L1, L2be languages over • Concatenation L1L2is defined as–L1L2= {xy | x  L1and y  L2}–Example: L1= {“hi”, “bye”}, L2= {“1”, “2”}•L1L2= {“hi1”, “hi2”, “bye1”, “bye2”}• Union is defined as–L1L2= { x | x  L1or x  L2}–Example: L1= {“hi”, “bye”}, L2= {“1”, “2”}•L1L2= {“hi”, “bye”, “1”, “2”}CMSC 330 10Operations on Languages (cont’d)• Define Lninductively as–L0= {}–Ln= LLn-1for n > 0• In other words,–L1= LL0= L{} = L–L2= LL1= LL–L3= LL2= LLL– ...CMSC 330 11Examples of Ln• Let L = {a, b, c}• Then–L0= {}–L1= {a, b, c}–L2= {aa, ab, ac, ba, bb, bc, ca, cb, cc}CMSC 330 12Operations on Languages (cont’d)• Kleene closure is defined asL* = i [0..]Li• In other words...L* is the language (set of all strings) formed by concatenating together zero or more strings from L3CMSC 330 13Definition of Regexps• Given an alphabet , the regular expressionsover  are defined inductively as– ...{}each element {}ØØdenotes languageregular expressionCMSC 330 14Definition of Regexps (cont’d)• Let A and B be regular expressions denoting languages LAand LB, respectively• There are no other regular expressions over • We use ()’s as needed for groupingLA*A*LALB(A|B)LALBABdenotes languageregular expressionCMSC 330 15The Language Denoted by an r.e.• For a regular expression e, we will write [[e]] to mean the language denoted by e– [[a]] = {a}– [[(a|b)]] = {a, b}•If s[[re]], we say that re accepts, describes, or recognizes s.CMSC 330 16Example 1• All strings over  = {a, b, c} such that all the a’sare first, the b’s are next, and the c’s last–Example: aaabbbbccc but not abcb• Regexp:– This is a valid regexp because:•ais a regexp ([[a]] = {a})•a*is a regexp ([[a*]] = {, a, aa, ...})• Similarly for b* and c*•So a*b*c* is a regular expression(Remember that we need to check this way because regular expressions are defined inductively.)a*b*c*CMSC 330 17Which Strings Does a*b*c* Recognize?aabbbccYes; aa [[a*]], bbb [[b*]], and cc [[c*]], so entire string is in [[a*b*c*]]abbYes, abb = abb, and  [[c*]]acYesYesaacbcNoabcdNo -- outside the languageCMSC 330 18Example 2• All strings over  = {a, b, c}• Regexp:• Other regular expressions for the same language?– (c|b|a)*–(a*|b*|c*)*– (a*b*c*)*– ((a|b|c)*|abc)–etc.(a|b|c)*4CMSC 330 19Example 3• All whole numbers containing the substring 330• Regular expression:• What if we want to get rid of leading 0’s?• ( (1|...|9)(0|1|...|9)*330(0|1|...|9)* | 330(0|1|...|9)* )• Any other solutions?• Challenge: What about all whole numbers notcontaining the substring 330?– Is it recognized by a regexp?(0|1|...|9)*330(0|1|...|9)*Yes. We’ll seehow to find itlater…CMSC 330 20Example 4• What is the English description for the language that (10|0)*(10|1)* denotes?– (10|0)*•0may appear anywhere•1must always be followed by 0– (10|1)*•1may appear anywhere•0must always be preceded by 1– Put together, all strings of 0’s and 1’s where every pair of adjacent 0’s precedes any pair of adjacent 1’sCMSC 330 21What Strings are in (10|0)*(10|1)* ?00101000 110111101First part in [[(10|0)*]]Second part in [[(10|1)*]]Notice that 0010 also in [[(10|0)*]]But remainder of string is not in [[(10|1)*]]0010101Yes101Yes011001NoCMSC 330 22Example 5• What language does this regular expression recognize?–((1|)(0|1|...|9) | (2(0|1|2|3)) ) : (0|1|...|5)(0|1|...|9)• All valid times written in 24-hour format– 10:17– 23:59–0:45–8:30CMSC 330 23Two More Examples• (000|00|1)*– Any string of


View Full Document

UMD CMSC 330 - Theory of Regular Expressions

Documents in this Course
Exam #1

Exam #1

6 pages

Quiz #1

Quiz #1

2 pages

Midterm 2

Midterm 2

12 pages

Exam #2

Exam #2

7 pages

Ocaml

Ocaml

7 pages

Parsing

Parsing

38 pages

Threads

Threads

12 pages

Ruby

Ruby

7 pages

Quiz #3

Quiz #3

2 pages

Threads

Threads

7 pages

Quiz #4

Quiz #4

2 pages

Exam #2

Exam #2

6 pages

Exam #1

Exam #1

6 pages

Threads

Threads

34 pages

Quiz #4

Quiz #4

2 pages

Threads

Threads

26 pages

Exam #2

Exam #2

9 pages

Exam #2

Exam #2

6 pages

Load more
Download Theory of Regular Expressions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Theory of Regular Expressions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Theory of Regular Expressions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?