Unformatted text preview:

Regular ExpressionsGreg PlaxtonTheory in Programming Practice, Fall 2005Department of Computer ScienceUniversity of Texas at AustinWhat is a Regular Expression?• A regular expression defines a (possibly infinite) set of strings over agiven alphabet• Analogous to an arithmetic expression– The symbols of the alphabet are analogous to the numerical constantsin an arithmetic expression– Instead of arithmetic operators such as addition, multiplication, andexponentiation, the operators are concatenation, union, and closureTheory in Programming Practice, Plaxton, Fall 2005Regular Expressions: Syntax• The symbols ∅ (empty set), ² (empty string), and any symbol of thealphabet are regular expressions• For any regular expressions p and q, (pq) (concatenation) and (p | q)(union) are regular expressions• For any regular expression p, p∗(Kleene closure) is a regular expressionTheory in Programming Practice, Plaxton, Fall 2005Regular Expressions: Semantics• The regular expression ∅ corresponds to the empty set of strings• The regular expression ² corresponds to the set of strings {²}• For any symbol a in the alphabet, the regular expression a correspondsto the set of strings {a}• For any regular expressions p and q with corresponding sets of stringsX and Y , the regular expression (pq) (resp., (p | q)) denotes the set ofstrings {xy | x ∈ X ∧ y ∈ Y } (resp., X ∪ Y )• For any regular expression p with corresponding set of strings X, theregular expression p∗denotes the set of strings{x1x2· · · xk| k ≥ 0 ∧ h∀i : 1 ≤ i ≤ k : xi∈ Xi}Theory in Programming Practice, Plaxton, Fall 2005Regular Expressions: Parenthesization• When writing a regular expression, we generally try to omit as manyparentheses as possible without altering the meaning of the expression• Where parentheses are omitted, Kleene closure has the highest bindingpower, then concatenation, then union– Parentheses may be omitted whenever this convention yields theintended parenthesization• Note that concatenation and union are associative– These facts often enable us to drop parentheses, e.g., we can writeabc instead of ((ab)c)Theory in Programming Practice, Plaxton, Fall 2005A Remark on Kleene Closure• One can think of Kleene closure as follows:p∗= ² | p | pp | ppp | . . .• The RHS above is not a regular expression because it has an infinitenumber of terms– It is straightforward to prove by induction that every regularexpression has a finite length• The motivation for introducing the Kleene closure operator is to makethe above RHS into a regular expressionTheory in Programming Practice, Plaxton, Fall 2005Regular Expressions: Examples• What is the set of strings corresponding to the regular expressiona | bc∗d?• It is often convenient to introduce identifiers to stand for certain regularexpressions and then to use these identifiers as a shorthand for buildingup more complex regular expressions– PosDigit = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9– Digit = 0 | PosDigit– Natural = 0 | PosDigit Digit∗• The set of strings over the lowercase English alphabet containing allfive vowels in order corresponds to the regular expression(Letter∗)a(Letter∗)e(Letter∗)i(Letter∗)o(Letter∗)u(Letter∗)whereLetter = a | b | c | . . . | zTheory in Programming Practice, Plaxton, Fall 2005A More Elaborate Example• For any binary string x, let f (x) denote the nonnegative integercorresponding to x– Example: If x = 00110, then f(x) = 6• Problem: Construct a regular expression corresponding to the set of allbinary strings x such that f(x) is a multiple of 3– We first inductively define the sets B0, B1, and B2of all binarystrings x such that f(x) is congruent to 0, 1, and 2, respectively,modulo 3– We then deduce a regular expression for B0Theory in Programming Practice, Plaxton, Fall 2005Inductive Definition of Sets B0, B1, and B2(0) The empty string belongs to B0(1) For any binary string x in B0, x0 belongs to B0and x1 belongs to B1(2) For any binary string x in B1, x0 belongs to B2and x1 belongs to B0(3) For any binary string x in B2, x0 belongs to B1and x1 belongs to B2Theory in Programming Practice, Plaxton, Fall 2005Characterization of B2in Terms of B1• By (2) and (3), any binary string in B2is either of the form x0 wherex belongs to B1, or is of the form x1 where x belongs to B2• It follows that B2consists of all binary strings of the form x01∗wherex belongs to B1Theory in Programming Practice, Plaxton, Fall 2005Characterization of B1in terms of B0• By (1), (3), and the preceding characterization of B2, any binary stringin B1is either of the form x1 where x belongs to B0, or is of the formx01∗0 where x belongs to B1• It follows that B1consists of all binary strings of the form x1(01∗0)∗where x belongs to B0Theory in Programming Practice, Plaxton, Fall 2005Deducing a Regular Expression for B0• By (0), (1), (2), and the preceding characterization of B1, the set B0consists of the empty string, all binary strings of the form x0 where xbelongs to B0, and all binary strings of the form x1(01∗0)∗1 where xbelongs to B0• It follows that B0consists of all binary strings of the form(0 | 1(01∗0)∗1)∗Theory in Programming Practice, Plaxton, Fall 2005Remark: Alternative View of the Preceding Example• The binary strings in B0may be viewed as being generated by thegrammarS −→ B0B0−→ ² | B00 | B11B1−→ B01 | B20B2−→ B10 | B21• As we have seen, the above grammar generates a regular language• Not all grammars generate regular languagesTheory in Programming Practice, Plaxton, Fall


View Full Document

UT CS 337 - Regular Expressions

Documents in this Course
Load more
Download Regular Expressions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regular Expressions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regular Expressions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?