DOC PREVIEW
UCF COT 3100 - Strings and Languages

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Strings and LanguagesA string is simply the concatenation of several letters in analphabet. Typically, we will define an alphabet as a set . So,for example, we could have  = {a,b}. Then, a string over this alphabet , would be any “word”formed with only the letters or characters a and b. There is nolimit on the length of a string. It must simply be a non negativeinteger. This means there is a string of length of length 0. This is knownas the empty string. The empty string is typically denoted by .In particular, if you concatenate the empty string with anyother string, you get back that string. It seems silly to have anempty string, but it will help out in certain situations. Makesure you do not confuse the empty string with the empty set.Also, recognize that  can never be a letter of an alphabet.In the book, they define non-empty strings in the followingmanner :n is a string of length n over the alphabet , and is defined asbelow:1) 1 = .2) n+1 = {xy | x, yn}And any string is simply a subset of n, where n is a positiveinteger.Using the book’s definition, we can define the following twosets:+ = n=1 to  n, or in English, + is the set of all strings ofpositive length over an alphabet .* = n=0 to  n, or in English, * is the set of all strings of overan alphabet .Now, that we have these definitions, we can define a language.A language L over an alphabet  is any subset of *. If a stringw  L, then we say that the string w belongs in the language L.Otherwise we say w does not belong in L. (For example, horseEnglish, but gjtysihg English, to the best of my knowledge.)In this class, we will focus on a specific type of language:regular languages.In particular, a language is a regular one if and only if it can beexpressed as a regular expression. You can think of a regularexpression as a mold. For example, when you are trying to finda file on your computer, but don’t know the EXACT name ofthe file, you may enter Eric*.doc because you know that youare searching for a file whose name starts with Eric and is aWord document. In this situation, the string with a star in it is a mold applyingto an infinite number of strings – namely those that start withEric and end in .doc. It may seem easy to see if a string fits amold, but there’s more than meets the eye here.Consider the mold *a*a. When trying to mold the stringbanana, you may match the first star to the b, and the secondstar to the first n, and conclude that the string doesn’t fit themold. BUT, this is not the case because we can match thesecond star to nan instead of just an n. Thus, if there is ANYway for a string to match a mold, then that string is said to bedescribed or specified by that mold.Regular expressions are nothing but molds. To determine if aparticular string is an element of the language described by aregular expression, you must see if there exists a way to makethat string fit to the mold given by the regular expression. Hereare the rules for forming a regular expression :R is a regular expression if R is1. a for some a,2. ,3. 4. (R1  R2), where R1 and R2 are regular expressions5. (R1  R2), where R1 and R2 are regular expressions6. (R*), where R is a regular expressionEach of these needs some explaining. The first three are the“building blocks” or the atoms of regular expressions. Only thecorrect character ‘fits’ #1, while only an empty string (a stringof length 0) fits #2, and absolutely nothing fits #3.These three alone are quite boring and useless. However, wecan create more complex expressions using rules 4, 5 and 6. Todecide if a string ‘fits’ to the mold of (R1  R2), you need tocheck if it fits to the mold of R1 OR if it fits to the mold of R2. Ifthe answer to either question is yes, then the string is a part ofthe language described by (R1  R2).To decide if a string fits the mold of (R1  R2) is a bit morecomplicated. Here is the rule:Let w be a string. If there exists a way to ‘split up’ w such thatw=xy, (the concatenation of strings x and y) where x is in thelanguage described by R1 and y is in the language described byR2, then w is in the language described by (R1  R2). Onceagain, just because one way to ‘split up’ the string doesn’twork, so to speak, doesn’t mean that the string is NOT in thelanguage. Take this example:Consider the language described by the regular expression:(a  baa)  (bba  bb).The string baabb is in the language in spite of the fact that thestring b does NOT match (a  baa) and the string aabb doesnot match (bba  bb).Finally, the most difficult rule of all is the Kleene star(*) rule. Astring w is in the language described by R* if w = x1 x2... xnfor some non-negative integer n, where xi is a string in thelanguage described by R for 1  i  n, for some set of xis. Thedifficulty here is that for a string to be in the language, theremust simply be some way to ‘break up’ the string into smallerpieces that are all described by the mold R. There are WAYTOO MANY ways to split up a string for us to really checkthem all. Thus, we must be careful before we claim that aparticular string is NOT in the language described by someregular expression, especially ones with *s in them.So, for example, here are some regular languages:1. Any finite language: You can form each string in thelanguage by concatenating elements from  together, and youcan put these all together with , to form the correspondingregular expression2. {(ab)n | n  0}, the corresponding regular expression is (ab)*.(This language contains the strings , ab, abab, ababab, etc.)3. The regular expression for the set of all languages that end inbb is (a  b)*bb.One thing to keep in mind is that we will be using letters tostand for BOTH elements of the alphabet AND languagesthemselves. I will try to stick to the convention that a lowercaseletter signifies an element of an alphabet where as anUPPERCASE letter signifies an entire language.You can essentially treat regular expressions identical toregular languages. The basic difference is that a languageshould be written out in a set


View Full Document

UCF COT 3100 - Strings and Languages

Documents in this Course
Lab 1

Lab 1

11 pages

Functions

Functions

10 pages

Load more
Download Strings and Languages
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Strings and Languages and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Strings and Languages 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?