DOC PREVIEW
Villanova CSC 9010 - Lecture 2

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSC 9010 Natural Language Processing Lecture 2: Regular Expressions, Finite State Automata Paula Matuszek Mary-Angela PapalaskariRegular Expressions and Text SearchingExampleTwo kinds of ErrorsTwo Antagonistic GoalsFinite State AutomataSlide 7More examples:Another FSA for the same language:Formally Specifying a FSADollars and CentsRecognitionTuring’s way of Visualizing RecognitionSlide 14D-RecognizeKey PointsSlide 17Recognition as SearchGenerative FormalismsSlide 20ReviewThree ViewsDefining Languages with ProductionsNon-DeterminismNon-Determinism cont.Are Non-deterministic FSA more powerful?Non-Deterministic RecognitionSlide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38ND-Recognize CodeInfinite SearchWhy Bother?Compositional MachinesUnionConcatenationNegationIntersection01/14/19CSC 9010- NLP - Regex, Finite State Automata 1CSC 9010Natural Language ProcessingLecture 2: Regular Expressions, Finite State AutomataPaula MatuszekMary-Angela PapalaskariPresentation slides adapted from Jim Martin’s course: http://www.cs.colorado.edu/~martin/csci5832.html01/14/19CSC 9010- NLP - Regex, Finite State Automata 2Regular Expressions and Text Searching•Everybody does it–Emacs, vi, perl, grep, etc..01/14/19CSC 9010- NLP - Regex, Finite State Automata 3Example•Find me all instances of the word “the” in a text.–/the/–/[tT]he/–/\b[tT]he\b/01/14/19CSC 9010- NLP - Regex, Finite State Automata 4Two kinds of Errors•Matching strings that we should not have matched (there, then, other)–False positives•Not matching things that we should have matched (The)–False negatives01/14/19CSC 9010- NLP - Regex, Finite State Automata 5Two Antagonistic Goals•Accuracy –(minimize false positives)•Coverage –(minimize false negatives).01/14/19CSC 9010- NLP - Regex, Finite State Automata 6Finite State Automata•Idealized machines for processing regular expressions•Example: /baa+!/01/14/19CSC 9010- NLP - Regex, Finite State Automata 7Finite State Automata•Idealized machines for processing regular expressions•Example: /baa+!/initial state accept state• 5 states• 5 transitions• alphabet?01/14/19CSC 9010- NLP - Regex, Finite State Automata 8More examples:01/14/19CSC 9010- NLP - Regex, Finite State Automata 9Another FSA for the same language:01/14/19CSC 9010- NLP - Regex, Finite State Automata 10Formally Specifying a FSA–The set of states: Q–A finite alphabet: Σ–A start state–A set of accept/final states–A transition function that maps QxΣ to Q01/14/19CSC 9010- NLP - Regex, Finite State Automata 11Dollars and Cents01/14/19CSC 9010- NLP - Regex, Finite State Automata 12Recognition•Recognition is the process of determining if a string should be accepted by a machine•Or… it’s the process of determining if as string is in the language we’re defining with the machine•Or… it’s the process of determining if a regular expression matches a string01/14/19CSC 9010- NLP - Regex, Finite State Automata 13Turing’s way of Visualizing Recognition01/14/19CSC 9010- NLP - Regex, Finite State Automata 14Recognition•Begin in the start state•Examine current input•Consult the table•Go to a new state and update the tape pointer.•When you run out of tape:•if in accepting state, accept input•else reject input01/14/19CSC 9010- NLP - Regex, Finite State Automata 15D-Recognize01/14/19CSC 9010- NLP - Regex, Finite State Automata 16Key Points•Deterministic means that at each point in processing there is always one unique thing to do (no choices).•D-recognize is a simple table-driven interpreter•The algorithm is universal for all unambiguous languages.–To change the machine, you change the table.01/14/19CSC 9010- NLP - Regex, Finite State Automata 17Key Points•Crudely therefore… matching strings with regular expressions is a matter of –translating the expression into a machine (table) and –passing the table to an interpreter01/14/19CSC 9010- NLP - Regex, Finite State Automata 18Recognition as Search•You can view this algorithm as a degenerate kind of state-space search.•States are pairings of tape positions and state numbers.•Operators are compiled into the table•Goal state is a pairing with the end of tape position and a final accept state•Its degenerate because?01/14/19CSC 9010- NLP - Regex, Finite State Automata 19Generative Formalisms•Formal Languages are sets of strings composed of symbols from a finite set of symbols.•Finite-state automata define formal languages (without having to enumerate all the strings in the language)•The term Generative is based on the view that you can run the machine as a generator to get strings from the language.01/14/19CSC 9010- NLP - Regex, Finite State Automata 20Generative Formalisms•FSAs can be viewed from two perspectives:–Acceptors that can tell you if a string is in the language–Generators to produce all and only the strings in the language01/14/19CSC 9010- NLP - Regex, Finite State Automata 21Review•Regular expressions are just a compact textual representation of FSAs•Recognition is the process of determining if a string/input is in the language defined by some machine.–Recognition is straightforward with deterministic machines.01/14/19CSC 9010- NLP - Regex, Finite State Automata 22Three Views•Three equivalent formal ways to look at what we’re up to (not including tables)Regular ExpressionsRegular LanguagesFinite State Automata01/14/19CSC 9010- NLP - Regex, Finite State Automata 23Defining Languages with ProductionsS → b a a AA → a AA → !S → NP VPNP → PrNounNP → Det NounDet → a | theNoun → cat | dog| bookPrNoun → samantha |elmer | fidoVP → IVerb | TVerb NPIVerb → ran |slept | ateTVerb → hit | kissed | ateRegular?Regular language01/14/19CSC 9010- NLP - Regex, Finite State Automata 24Non-DeterminismCompare:01/14/19CSC 9010- NLP - Regex, Finite State Automata 25Non-Determinism cont.•Epsilon transitions:–Note: these transitions do not examine or advance the tape during recognitionε01/14/19CSC 9010- NLP - Regex, Finite State Automata 26Are Non-deterministic FSA more powerful?NO:•Non-deterministic machines can be converted to deterministic ones with a fairly simple construction•One way to do recognition with a non-deterministic machine is to turn it into a deterministic one.01/14/19CSC 9010- NLP - Regex, Finite State Automata 27Non-Deterministic Recognition•In a ND FSA there exists at least one path through the


View Full Document

Villanova CSC 9010 - Lecture 2

Documents in this Course
Lecture 2

Lecture 2

48 pages

Load more
Download Lecture 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?