Unformatted text preview:

Natural Language Processing Lecture 2 8 29 2013 Jim Martin Today Review Finish last time Finite state methods 9 17 13 Speech and Language Processing Jurafsky and Martin 2 1 Categories of Knowledge Phonology Morphology Syntax Semantics Pragmatics Discourse Morphological Processing 9 17 13 Each kind of knowledge has associated with it an encapsulated set of processes that make use of it Interfaces are defined that allow the various levels to communicate This often leads to a pipeline architecture Syntactic Analysis Semantic Interpretation Context Speech and Language Processing Jurafsky and Martin 3 Ambiguity Ambiguity is a fundamental problem of computational linguistics Resolving ambiguity is a crucial goal 9 17 13 Speech and Language Processing Jurafsky and Martin 4 2 Ambiguity Find at least 5 meanings of this sentence w I made her duck 9 17 13 Speech and Language Processing Jurafsky and Martin 5 Ambiguity Find at least 5 meanings of this sentence w I made her duck 9 17 13 I cooked waterfowl for her benefit to eat I cooked waterfowl belonging to her I created the ceramic duck she owns I caused her to quickly lower her upper body I waved my magic wand and turned her into undifferentiated waterfowl Speech and Language Processing Jurafsky and Martin 6 3 Sources of Ambiguity I caused her to quickly lower her head or body w Lexical category part of speech duck can be a noun or verb a verb in this case I cooked waterfowl belonging to her w Lexical category her can be a possessive of her or dative for her pronoun I made the ceramic duck statue she owns w Lexical Semantics make can mean create or cook and about 100 other things as well 9 17 13 Speech and Language Processing Jurafsky and Martin 7 Ambiguity is Pervasive Syntax Make can be w Transitive verb has a noun direct object I cooked waterfowl belonging to her w Ditransitive verb has 2 noun objects I made her into undifferentiated waterfowl w Action di transitive verb has a direct object and another verb I caused her to move her body 9 17 13 Speech and Language Processing Jurafsky and Martin 8 4 Problem Remember our pipeline Morphological Processing 9 17 13 Syntactic Analysis Semantic Interpretation Context Speech and Language Processing Jurafsky and Martin 9 Problem Morphological Processing 9 17 13 Semantic Semantic Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Syntactic Semantic Interpretation Syntactic Semantic Interpretation Syntactic Analysis Semantic Interpretation Syntactic Analysis Semantic Interpretation Syntactic Analysis Semantic Interpretation Syntactic Analysis Semantic Interpretation Syntactic Analysis Semantic Interpretation Analysis Semantic Interpretation Analysis Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Interpretation Interpretation Speech and Language Processing Jurafsky and Martin 10 5 Algorithms Many of the algorithms that we ll study will turn out to be transducers algorithms that take one kind of structure as input and output another Unfortunately ambiguity makes this process difficult This leads us to employ algorithms of various sorts that are designed to manage ambiguity Speech and Language Processing Jurafsky and Martin 9 17 13 11 Paradigms In particular w State space search To manage the problem of making choices during processing when we lack the information needed to make the right choice w Dynamic programming To avoid having to redo work during the course of a state space search CKY Earley Minimum Edit Distance Viterbi Baum Welch w Classifiers Machine learning based classifiers that are trained to make decisions based on features extracted from the local context Used to decide among ambiguous choices and then move on hoping that the right choice was made 9 17 13 Speech and Language Processing Jurafsky and Martin 12 6 State Space Search States represent pairings of partially processed inputs with partially constructed representations Goals are inputs paired with completed representations that satisfy some criteria As with most interesting problems the spaces are normally too large to exhaustively explore w We need heuristics to guide the search w Criteria to trim the space 9 17 13 Speech and Language Processing Jurafsky and Martin 13 Dynamic Programming Don t do the same work over and over Avoid this by building and making use of solutions to sub problems that must be invariant across all parts of the space 9 17 13 Speech and Language Processing Jurafsky and Martin 14 7 Break Rest of today is Chapter 2 We ll be doing Chapter 3 over the next few lectures 9 17 13 Speech and Language Processing Jurafsky and Martin 15 Admin Questions 9 17 13 Speech and Language Processing Jurafsky and Martin 16 8 Regular Expressions and Text Searching Regular expressions are a compact textual representation of a set of strings that constitute a language w In the simplest case regular expressions describe regular languages Here a language means a set of strings given some alphabet Extremely versatile and widely used technology w Emacs vi perl grep etc 9 17 13 Speech and Language Processing Jurafsky and Martin 17 Example Find all the instances of the word the in a text w the w tT he w b tT he b 9 17 13 Speech and Language Processing Jurafsky and Martin 18 9 Errors The process we just went through was based on two fixing kinds of errors w Matching strings that we should not have matched there then other False positives Type I w Not matching things that we should have matched The False negatives Type II 9 17 13 Speech and Language Processing Jurafsky and Martin 19 Errors We ll be telling the same story with respect to evaluation for many tasks Reducing the error rate for an application often involves two antagonistic efforts w Increasing accuracy or precision minimizing false positives w Increasing coverage or recall minimizing false negatives 9 17 13 Speech and Language Processing Jurafsky and Martin 20 10 3 Formalisms Recall that I said that regular expressions describe languages sets of strings Turns out that there are 3 formalisms for capturing such languages each with their own motivation and history w Regular expressions Compact textual strings Perfect for specifying patterns in programs or command lines w Finite state automata Graphs w Regular grammars Rules 9 17 13 Speech and Language Processing Jurafsky and Martin 21 3 Formalisms These three approaches are all equivalent in terms of their ability to capture regular


View Full Document

CU-Boulder CSCI 5832 - Finite-state methods

Loading Unlocking...
Login

Join to view Finite-state methods and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Finite-state methods and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?