DOC PREVIEW
UW-Madison CS 536 - Lecture 08

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

89CS 536 Spring 2008©Definition of RegularExpressionsUsing catenations, alternationand Kleene closure, we candefine regular expressions asfollows:• ∅ is a regular expression denotingthe empty set (the set containingno strings). ∅ is rarely used, but isincluded for completeness.• λ is a regular expression denotingthe set that contains only theempty string. This set is not thesame as the empty set, because itcontains one element.• A string s is a regular expressiondenoting a set containing thesingle string s.90CS 536 Spring 2008©• If A and B are regular expressions,then A | B, AB, and A*are alsoregular expressions, denoting thealternation, catenation, and Kleeneclosure of the correspondingregular sets.Each regular expressiondenotes a set of strings (aregular set). Any finite set ofstrings can be represented by aregular expression of the form(s1 | s2 | … | sk ). Thus thereserved words of ANSI C canbe defined as(auto | break | case | …).91CS 536 Spring 2008©The following additionaloperations useful. They are notstrictly necessary, because theireffect can be obtained usingalternation, catenation, Kleeneclosure:• P+denotes all strings consisting ofone or more strings in P catenatedtogether:P* =(P+| λ) and P+ = PP*.For example, ( 0 | 1 )+ is the set ofall strings containing one or morebits.• If A is a set of characters, Not(A)denotes (Σ− A); that is, allcharacters in Σ not included in A.Since Not(A) can never be largerthan Σ and Σ is finite, Not(A) mustalso be finite, and is thereforeregular. Not(A) does not contain λsince λ is not a character (it is azero-length string).92CS 536 Spring 2008©For example, Not(Eol) is the set ofall characters excluding Eol (theend of line character, '\n' in Java orC).• It is possible to extend Not tostrings, rather than just Σ. That is,if S is a set of strings, we defineSto be(Σ*− S); the set of all strings exceptthose in S. ThoughS is usuallyinfinite, it is also regular if S is.• If k is a constant, the set Akrepresents all strings formed bycatenating k (possibly different)strings from A.That is, Ak= (AAA…) (k copies).Thus ( 0 | 1 )32 is the set of all bitstrings exactly 32 bits long.93CS 536 Spring 2008©ExamplesLet D be the ten single digitsand let L be the set of all 52letters. Then• A Java or C++ single-line commentthat begins with // and ends withEol can be defined as:Comment = // Not(Eol)* Eol• A fixed decimal literal (e.g.,12.345) can be defined as:Lit = D+. D+•An optionally signed integer literalcan be defined as:IntLiteral = ( '+' | − | λ ) D+(Why the quotes on the plus?)94CS 536 Spring 2008©• A comment delimited by ##markers, which allows single #’swithin the comment body:Comment2 =## ((# | λ) Not(#) )* ##All finite sets and many infinite setsare regular. But not all infinite setsare regular. Consider the set ofbalanced brackets of the form[ [ […] ] ].This set is defined formally as{ [m ]m | m ≥ 1 }.This set is known not to be regular.Any regular expression that tries todefine it either does not get allbalanced nestings or it includesextra, unwanted strings.95CS 536 Spring 2008©Finite Automata and ScannersA finite automaton (FA) can beused to recognize the tokensspecified by a regularexpression. FAs are simple,idealized computers thatrecognize strings belonging toregular sets. An FA consists of:• A finite set of states• A set of transitions (or moves)fromone state to another, labeled withcharacters in Σ• A special state called the start state• A subset of the states called theaccepting, or final, states96CS 536 Spring 2008©These four components of afinite automaton are oftenrepresented graphically:Finite automata (the plural ofautomaton is automata) arerepresented graphically usingtransition diagrams. We start atthe start state. If the next inputcharacter matches the label onis a transitionis the start stateis an accepting stateis a state97CS 536 Spring 2008©a transition from the currentstate, we go to the state itpoints to. If no move ispossible, we stop. If we finishin an accepting state, thesequence of characters readforms a valid token; otherwise,we have not seen a valid token.In this diagram, the validtokens are the stringsdescribed by the regularexpression (a b (c)+ )+.abcca98CS 536 Spring 2008©Deterministic Finite AutomataAs an abbreviation, a transitionmay be labeled with more thanone character (for example,Not(c)). The transition may betaken if the current inputcharacter matches any of thecharacters labeling the transition.If an FA always has a uniquetransition (for a given state andcharacter), the FA is deterministic(that is, a deterministic FA, orDFA). Deterministic finiteautomata are easy to programand often drive a scanner.If there are transitions to morethan one state for some character,then the FA is nondeterministic(that is, an NFA).99CS 536 Spring 2008©A DFA is conveniently representedin a computer by a transitiontable. A transition table, T, is atwo dimensional array indexed bya DFA state and a vocabularysymbol.Table entries are either a DFAstate or an error flag (oftenrepresented as a blank tableentry). If we are in state s, andread character c, then T[s,c] willbe the next state we visit, or T[s,c]will contain an error markerindicating that c cannot extendthe current token. For example,the regular expression// Not(Eol)* Eolwhich defines a Java or C++single-line comment, might betranslated into100CS 536 Spring 2008©The corresponding transitiontable is:A complete transition tablecontains one column for eachcharacter. To save space, tablecompression may be used. Onlynon-error entries are explicitlyrepresented in the table, usinghashing, indirection or linkedstructures.State Character/ Eol a b …12233343334eofEol//Not(Eol)1234101CS 536 Spring 2008©All regular expressions can betranslated into DFAs that accept(as valid tokens) the stringsdefined by the regularexpressions. This translation canbe done manually by aprogrammer or automaticallyusing a scanner generator.A DFA can be coded in:• Table-driven form• Explicit control formIn the table-driven form, thetransition table that defines aDFA’s actions is explicitlyrepresented in a run-time tablethat is “interpreted” by a driverprogram.In the direct control form, thetransition table that defines aDFA’s actions appears implicitly asthe control logic of the program.102CS 536 Spring 2008©For example, supposeCurrentChar is the current inputcharacter. End of file isrepresented


View Full Document

UW-Madison CS 536 - Lecture 08

Download Lecture 08
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 08 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 08 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?