Unformatted text preview:

CMSC 723 / LING 645: Intro to Computational LinguisticsRegular Expressions and Finite State AutomataRegular ExpressionsSlide 4Slide 5Slide 6Disjunction, Grouping, PrecedencePerl CommandsWriting correct expressionsA more complex exampleAdvanced operatorsSubstitutions and MemoryEliza [Weizenbaum, 1966]Eliza-style regular expressionsFinite-state AutomataFinite-state Automata (Machines)Input TapeSlide 18Slide 19State-transition TablesD-RECOGNIZEAdding a failing stateAdding an “all else” arcLanguages and AutomataSlide 25Using NFSAs to accept stringsUsing NFSAsReadings for next timeCMSC 723 / LING 645: Intro to Computational LinguisticsSeptember 8, 2004: MonzRegular Expressions and Finite State Automata (J&M 2) Prof. Bonnie J. DorrDr. Christof MonzTA: Adam LeeRegular Expressions and Finite State AutomataREs: Language for specifying text stringsSearch for document containing a string–Searching for “woodchuck” •Finite-state automata (FSA)(singular: automaton)•How much wood would a woodchuck chuck if a woodchuck would chuck wood?–Searching for “woodchucks” with an optional final “s”Regular ExpressionsBasic regular expression patternsPerl-based syntax (slightly different from other notations for regular expressions)Disjunctions /[wW]oodchuck/Regular ExpressionsRanges [A-Z]Negations [^Ss]Regular ExpressionsOptional characters ? ,* and +–? (0 or 1) •/colou?r/  color or colour–* (0 or more)•/oo*h!/  oh! or Ooh! or Ooooh!*+Stephen Cole Kleene –+ (1 or more) •/o+h!/  oh! or Ooh! or Ooooh!Wild cards .- /beg.n/  begin or began or begunRegular ExpressionsAnchors ^ and $–/^[A-Z]/  “Ramallah, Palestine”–/^[^A-Z]/  “¿verdad?” “really?”–/\.$/  “It is over.”–/.$/  ?Boundaries \b and \B–/\bon\b/  “on my way” “Monday”–/\Bon\b/  “automaton” Disjunction |–/yours|mine/  “it is either yours or mine”Disjunction, Grouping, PrecedenceColumn 1 Column 2 Column 3 …How do we express this?/Column [0-9]+ *//(Column [0-9]+ +)*/Precedence–Parenthesis ()–Counters * + ? {}–Sequences and anchors the ^my end$–Disjunction |REs are greedy!Perl CommandsWhile ($line=<STDIN>){if ($line =~ /the/){print “MATCH: $line”;}}Writing correct expressionsExercise: Write a Perl regular expression to match the English article “the”:/the//[tT]he//\b[tT]he\b//[^a-zA-Z][tT]he[^a-zA-Z]//(^|[^a-zA-Z])[tT]he[^a-zA-Z]/A more complex exampleExercise: Write a regular expression that will match “any PC with more than 500MHz and 32 Gb of disk space for less than $1000”:/$[0-9]+//$[0-9]+\.[0-9][0-9]//\b$[0-9]+(\.[0-9][0-9])?\b//\b$[0-9][0-9]?[0-9]?(\.[0-9][0-9])?\b//\b[0-9]+ *([MG]Hz|[Mm]egahertz| [Gg]igahertz)\b//\b[0-9]+ *(Mb|[Mm]egabytes?)\b//\b[0-9](\.[0-9]+) *(Gb|[Gg]igabytes?)\b/Advanced operatorsshould be _Substitutions and MemorySubstitutionss/colour/color/s/colour/color/gs/([Cc]olour)/$1olor//the (.*)er they were, the $1er they will be//the (.*)er they (.*), the $1er they $2/Substitute as many times as possible!Case insensitive matchings/colour/color/iMemory ($1, $2, etc. refer back to matches)Eliza [Weizenbaum, 1966]User: Men are all alikeELIZA: IN WHAT WAYUser: They’re always bugging us about something or otherELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE?User: Well, my boyfriend made me come hereELIZA: YOUR BOYFRIEND MADE YOU COME HEREUser: He says I’m depressed much of the timeELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSEDEliza-style regular expressionss/.* YOU ARE (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \1/s/.* YOU ARE (depressed|sad) .*/WHY DO YOU THINK YOU ARE \1/s/.* all .*/IN WHAT WAY/s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/Step 1: replace first person with second person referencess/\bI(’m| am)\b /YOU ARE/gs/\bmy\b /YOUR/gS/\bmine\b /YOURS/gStep 2: use additional regular expressions to generate repliesStep 3: use scores to rank possible transformationsFinite-state AutomataFinite-state automata (FSA)Regular languagesRegular expressionsFinite-state Automata (Machines)/^baa+!$/q0q1q2q3q4b a a !astatetransitionfinalstate baa! baaa! baaaa! baaaaa! ...Input Tapea b a ! bq001 2 3 4b a a!aREJECTInput Tapeb a a aq0q1q2q3q3q4!01 2 3 4b a a!aACCEPTFinite-state AutomataQ: a finite set of N states –Q = {q0, q1, q2, q3, q4}: a finite input alphabet of symbols = {a, b, !}q0: the start stateF: the set of final states–F = {q4}(q,i): transition function –Given state q and input symbol i, return new state q'(q3,!)  q4State-transition TablesInputState b a !0 1 Ø Ø1 Ø 2 Ø2 Ø 3 Ø3 Ø 3 44: Ø Ø ØD-RECOGNIZEfunction D-RECOGNIZE (tape, machine) returns accept or reject index  Beginning of tape current-state  Initial state of machine loop if End of input has been reached then if current-state is an accept state then return accept else return reject elsif transition-table [current-state, tape[index]] is empty then return reject else current-state  transition-table [current-state, tape[index]] index  index + 1endAdding a failing stateq0q1q2q3q4b a a !aqFa!b!b !bba!Adding an “all else” arcq0q1q2q3q4b a a !aqF====Languages and AutomataCan use FSA as a generator as well as a recognizerFormal language L: defined by machine M that both generates and recognizes all and only the strings of that language. –L(M) = {baa!, baaa!, baaaa!, …}Regular languages vs. non-regular languagesLanguages and AutomataDeterministic vs. Non-deterministic FSAsEpsilon () transitionsUsing NFSAs to accept stringsBackup: add markers at choice points, then possibly revisit unexplored arcs at marked choice point.Look-ahead: look ahead in inputParallelism: look at alternatives in parallelUsing NFSAsInputState b a !0 1 Ø Ø Ø1 Ø 2 Ø Ø2 Ø 2,3 Ø Ø3 Ø Ø 4 Ø4: Ø Ø Ø ØReadings for next timeJ&M Chapter


View Full Document

UMD CMSC 723 - Intro to Computational Linguistics

Documents in this Course
Lecture 9

Lecture 9

12 pages

Smoothing

Smoothing

15 pages

Load more
Download Intro to Computational Linguistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Intro to Computational Linguistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Intro to Computational Linguistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?