DOC PREVIEW
UT CS 341 - Regular Languages and Finite State Machines

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Supplementary Materials Regular Languages and Finite State Machines 1 Regular Languages and Finite State Machines 1 Regular Languages The first class of languages that we will consider is the regular languages. As we'll see later, there are several quite different ways in which regular languages can be defined, but we'll start with one simple technique, regular expressions. A regular expression is an expression (string) whose "value" is a language. Just as 3 + 4 and 14/2 are two arithmetic expressions whose values are equal, so are (a ∪ b)* and a* ∪ (a ∪ b)*b(a ∪ b)* two regular expressions whose values are equal. We will use regular expressions to denote languages just as we use arithmetic expressions to denote numbers. Just as there are some numbers, like Π, that cannot be expressed by arithmetic expressions of integers, so too there are some languages, like anbn, that cannot be expressed by regular expressions. In fact, we will define the class of regular languages to be precisely those that can be described with regular expressions. Let's continue with the analogy between arithmetic expressions and regular expressions. The syntax of arithmetic expressions is defined recursively: 1. Any numeral in {0, 1,2, ...} is an arithmetic expression. 2. If α and β are expressions, so is (α + β). 3. If α and β are expressions, so is (α * β). 4. If α and β are expressions, so is (α - β). 5. If α and β are expressions, so is (α/β). 6. If α is an expression, so is -α. These operators that we've just defined have associated with them a set of precedence rules, so we can write -3 + 4*5 instead of (-3 + (4*5)). Now let's return to regular expressions. The syntax of regular expressions is also defined recursively: 1. ∅ and each member of Σ is a regular expression. 2. If α , β are regular expressions, then so is αβ 3. If α , β are regular expressions, then so is α∪β. 4. If α is a regular expression, then so is α*. 5. If α is a regular expression, then so is (α). 6. Nothing else is a regular expression. Similarly there are precedence rules for regular expressions, so we can write a* ∪ bc instead of (a* ∪ (bc)). Note that * binds more tightly than does concatenation, which binds more tightly than ∪. In both cases (arithmetic expressions and regular expressions) there is a distinction between the expression and its value. 5 + 7 is not the same expression as 3 * 4, even though they both have the same value. (You might imagine the expressions being in quotation marks: "5 + 7" is clearly not the same as "3 * 4". Similarly, "John's a good guy" is a different sentence from "John is nice", even though they both have the same meaning, more or less.) The rules that determine the value of an arithmetic expression are quite familiar to us, so much so that we are usually not consciously aware of them. But regular expressions are less familiar, so we will explicitly specify the rules that define the values of regular expressions. We do this by recursion on the structure of the expression. Just remember that the regular expression itself is a syntactic object made of parentheses and other symbols, whereas its value is a language. We define L() to be the function that maps regular expressions to their values. We might analogously define L() for arithmetic expressions and say that L(5 + 7) = 12 = L(3 * 4). L() is an example of a semantic interpretation function, i. e., it is a function that maps a string in some language to its meaning. In our case, of course, the language from which the arguments to L() will be drawn is the language of regular expressions as defined above. L() for regular expressions is defined as follows: 1. L(∅) = ∅ and L(a) = {a} for each a ∈ Σ 2. If α, β are regular expressions, thenSupplementary Materials Regular Languages and Finite State Machines 2 L((αβ)) = L(α) L(β) = all strings that can be formed by concatenating to some string from α some string from β. Note that if either α or β is ∅, then its language is ∅, so there is nothing to concatenate and the result is ∅. 3. If α, β are regular expressions, then L((α∪β)) = L(α) ∪ L(β) 4. If α is a regular expression, then L(α*) = L(α)* 5. L( (α) ) = L(α) So, for example, let's find the meaning of the regular expression (a ∪ b)*b: L((a ∪ b)*b) = L((a ∪ b)*) L(b) = L(a ∪ b)* L(b) = (L(a) ∪ L(b))* L(b) = ({a} ∪ {b})* {b} = {a, b}* {b} which is just the set of all strings ending in b. Another example is L(((a ∪ b)(a ∪ b))a(a ∪ b)*) = {xay: x and y are strings of a's and b's and |x| = 2}. The distinction between an expression and its meaning is somewhat pedantic, but you should try to understand it. We will usually not actually write L() because it is generally clear from context whether we mean the regular expression or the language denoted by it. For example, a ∈ (a ∪ b)* is technically meaningless since (a ∪ b)* is a regular expression, not a set. Nonetheless, we use it as a reasonable abbreviation for a ∈ L((a ∪ b)*), just as we write 3 + 4 = 4 + 3 to mean that the values of "3 + 4" and "4 + 3", not the expressions themselves, are identical. Here are some useful facts about regular expressions and the languages they describe: • (a ∪ b)* = (a*b*)* = (b*a*)* = set of all strings composed exclusively of a's and b's (including the empty string) • (a ∪ b)c = (ac ∪ bc) Concatenation distributes over union • c(a ∪ b) = (ca ∪ cb) " • a* ∪ b* ≠ (a ∪ b)* The right-hand expression denotes a set containing strings of mixed a's and b's, while the left- hand expression does not. • (ab)* ≠ a*b* In the right-hand expression, all a's must precede all b's. That's not true for the left-hand expression. • a* ∪ ∅* = a* ∪ ε = a* There is an algebra of regular expressions, but it is rather complex and not worth the effort to learn it. Therefore, we will rely primarily on our knowledge of what the expressions mean to determine the equivalence (or non-equivalence) or regular expressions. We are now in a position to state formally our definition of the class of regular languages: Given an alphabet Σ, the set of regular languages


View Full Document

UT CS 341 - Regular Languages and Finite State Machines

Documents in this Course
Load more
Download Regular Languages and Finite State Machines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regular Languages and Finite State Machines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regular Languages and Finite State Machines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?