Strings and LanguagesA string is simply the concatenation of several letters in analphabet. Typically, we will define an alphabet as a set . So,for example, we could have = {a,b}. Then, a string over this alphabet , would be any “word”formed with only the letters or characters a and b. There is nolimit on the length of a string. It must simply be a non negativeinteger. This means there is a string of length of length 0. This is knownas the empty string. The empty string is typically denoted by .In particular, if you concatenate the empty string with anyother string, you get back that string. It seems silly to have anempty string, but it will help out in certain situations. Makesure you do not confuse the empty string with the empty set.Also, recognize that can never be a letter of an alphabet.In the book, they define non-empty strings in the followingmanner :n is a string of length n over the alphabet , and is defined asbelow:1) 1 = .2) n+1 = {xy | x, yn}And any string is simply a subset of n, where n is a positiveinteger.Using the book’s definition, we can define the following twosets:+ = n=1 to n, or in English, + is the set of all strings ofpositive length over an alphabet .* = n=0 to n, or in English, * is the set of all strings of overan alphabet .Now, that we have these definitions, we can define a language.A language L over an alphabet is any subset of *. If a stringw L, then we say that the string w belongs in the language L.Otherwise we say w does not belong in L. (For example, horseEnglish, but gjtysihg English, to the best of my knowledge.)In this class, we will focus on a specific type of language:regular languages.In particular, a language is a regular one if and only if it can beexpressed as a regular expression. You can think of a regularexpression as a mold. For example, when you are trying to finda file on your computer, but don’t know the EXACT name ofthe file, you may enter Eric*.doc because you know that youare searching for a file whose name starts with Eric and is aWord document. In this situation, the string with a star in it is a mold applyingto an infinite number of strings – namely those that start withEric and end in .doc. It may seem easy to see if a string fits amold, but there’s more than meets the eye here.Consider the mold *a*a. When trying to mold the stringbanana, you may match the first star to the b, and the secondstar to the first n, and conclude that the string doesn’t fit themold. BUT, this is not the case because we can match thesecond star to nan instead of just an n. Thus, if there is ANYway for a string to match a mold, then that string is said to bedescribed or specified by that mold.Regular expressions are nothing but molds. To determine if aparticular string is an element of the language described by aregular expression, you must see if there exists a way to makethat string fit to the mold given by the regular expression. Hereare the rules for forming a regular expression :R is a regular expression if R is1. a for some a,2. ,3. 4. (R1 R2), where R1 and R2 are regular expressions5. (R1 R2), where R1 and R2 are regular expressions6. (R*), where R is a regular expressionEach of these needs some explaining. The first three are the“building blocks” or the atoms of regular expressions. Only thecorrect character ‘fits’ #1, while only an empty string (a stringof length 0) fits #2, and absolutely nothing fits #3.These three alone are quite boring and useless. However, wecan create more complex expressions using rules 4, 5 and 6. Todecide if a string ‘fits’ to the mold of (R1 R2), you need tocheck if it fits to the mold of R1 OR if it fits to the mold of R2. Ifthe answer to either question is yes, then the string is a part ofthe language described by (R1 R2).To decide if a string fits the mold of (R1 R2) is a bit morecomplicated. Here is the rule:Let w be a string. If there exists a way to ‘split up’ w such thatw=xy, (the concatenation of strings x and y) where x is in thelanguage described by R1 and y is in the language described byR2, then w is in the language described by (R1 R2). Onceagain, just because one way to ‘split up’ the string doesn’twork, so to speak, doesn’t mean that the string is NOT in thelanguage. Take this example:Consider the language described by the regular expression:(a baa) (bba bb).The string baabb is in the language in spite of the fact that thestring b does NOT match (a baa) and the string aabb doesnot match (bba bb).Finally, the most difficult rule of all is the Kleene star(*) rule. Astring w is in the language described by R* if w = x1 x2... xnfor some non-negative integer n, where xi is a string in thelanguage described by R for 1 i n, for some set of xis. Thedifficulty here is that for a string to be in the language, theremust simply be some way to ‘break up’ the string into smallerpieces that are all described by the mold R. There are WAYTOO MANY ways to split up a string for us to really checkthem all. Thus, we must be careful before we claim that aparticular string is NOT in the language described by someregular expression, especially ones with *s in them.So, for example, here are some regular languages:1. Any finite language: You can form each string in thelanguage by concatenating elements from together, and youcan put these all together with , to form the correspondingregular expression2. {(ab)n | n 0}, the corresponding regular expression is (ab)*.(This language contains the strings , ab, abab, ababab, etc.)3. The regular expression for the set of all languages that end inbb is (a b)*bb.One thing to keep in mind is that we will be using letters tostand for BOTH elements of the alphabet AND languagesthemselves. I will try to stick to the convention that a lowercaseletter signifies an element of an alphabet where as anUPPERCASE letter signifies an entire language.You can essentially treat regular expressions identical toregular languages. The basic difference is that a languageshould be written out in a set
View Full Document