Natural Language Processing Lecture 2 8 29 2013 Jim Martin Today Review Finish last time Finite state methods 01 14 19 Speech and Language Processing Jurafsky and Martin 2 Categories of Knowledge Phonology Morphology Syntax Semantics Pragmatics Discourse Morphological Processing 01 14 19 Each kind of knowledge has associated with it an encapsulated set of processes that make use of it Interfaces are defined that allow the various levels to communicate This often leads to a pipeline architecture Syntactic Analysis Semantic Interpretation Speech and Language Processing Jurafsky and Martin Context 3 Ambiguity Ambiguity is a fundamental problem of computational linguistics Resolving ambiguity is a crucial goal 01 14 19 Speech and Language Processing Jurafsky and Martin 4 Ambiguity Find at least 5 meanings of this sentence I made her duck 01 14 19 Speech and Language Processing Jurafsky and Martin 5 Ambiguity Find at least 5 meanings of this sentence I made her duck I cooked waterfowl for her benefit to eat I cooked waterfowl belonging to her I created the ceramic duck she owns I caused her to quickly lower her upper body I waved my magic wand and turned her into undifferentiated waterfowl 01 14 19 Speech and Language Processing Jurafsky and Martin 6 Sources of Ambiguity I caused her to quickly lower her head or body Lexical category part of speech duc k can be a noun or verb a verb in this case I cooked waterfowl belonging to her Lexical category her can be a possessive of her or dative for her pronoun I made the ceramic duck statue she owns Lexical Semantics make can mean create or cook and about 100 other things as well 01 14 19 Speech and Language Processing Jurafsky and Martin 7 Ambiguity is Pervasive Syntax Make can be Transitive verb has a noun direct object I cooked waterfowl belonging to her Ditransitive verb has 2 noun objects I made her into undifferentiated waterfowl Action di transitive verb has a direct object and another verb I caused her to move her body 01 14 19 Speech and Language Processing Jurafsky and Martin 8 Problem Remember our pipeline Morphological Processing 01 14 19 Syntactic Analysis Semantic Interpretation Speech and Language Processing Jurafsky and Martin Context 9 Problem Morphological Processing 01 14 19 Semantic Semantic Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Syntactic Semantic Interpretation Syntactic Semantic Interpretation Syntactic Analysis Semantic Interpretation Syntactic Analysis Semantic Interpretation Syntactic Analysis Semantic Interpretation Syntactic Analysis Semantic Interpretation Syntactic Analysis Semantic Interpretation Analysis Semantic Interpretation Analysis Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Interpretation Interpretation Speech and Language Processing Jurafsky and Martin 10 Algorithms Many of the algorithms that we ll study will turn out to be transducers algorithms that take one kind of structure as input and output another Unfortunately ambiguity makes this process difficult This leads us to employ algorithms of various sorts that are designed to manage ambiguity 01 14 19 Speech and Language Processing Jurafsky and Martin 11 Paradigms In particular State space search To manage the problem of making choices during processing when we lack the information needed to make the right choice Dynamic programming To avoid having to redo work during the course of a state space search CKY Earley Minimum Edit Distance Viterbi Baum Welch Classifiers Machine learning based classifiers that are trained to make decisions based on features extracted from the local context Used to decide among ambiguous choices and then move on hoping that the right choice was made 01 14 19 Speech and Language Processing Jurafsky and Martin 12 State Space Search States represent pairings of partially processed inputs with partially constructed representations Goals are inputs paired with completed representations that satisfy some criteria As with most interesting problems the spaces are normally too large to exhaustively explore We need heuristics to guide the search Criteria to trim the space 01 14 19 Speech and Language Processing Jurafsky and Martin 13 Dynamic Programming Don t do the same work over and over Avoid this by building and making use of solutions to subproblems that must be invariant across all parts of the space 01 14 19 Speech and Language Processing Jurafsky and Martin 14 Break Rest of today is Chapter 2 We ll be doing Chapter 3 over the next few lectures 01 14 19 Speech and Language Processing Jurafsky and Martin 15 Admin Questions 01 14 19 Speech and Language Processing Jurafsky and Martin 16 Regular Expressions and Text Searching Regular expressions are a compact textual representation of a set of strings that constitute a language In the simplest case regular expressions describe regular languages Here a language means a set of strings given some alphabet Extremely versatile and widely used technology Emacs vi perl grep etc 01 14 19 Speech and Language Processing Jurafsky and Martin 17 Example Find all the instances of the word the in a text the tT he b tT he b 01 14 19 Speech and Language Processing Jurafsky and Martin 18 Errors The process we just went through was based on two fixing kinds of errors Matching strings that we should not have matched there then other False positives Type I Not matching things that we should have matched The False negatives Type II 01 14 19 Speech and Language Processing Jurafsky and Martin 19 Errors We ll be telling the same story with respect to evaluation for many tasks Reducing the error rate for an application often involves two antagonistic efforts Increasing accuracy or precision minimizing false positives Increasing coverage or recall minimizing false negatives 01 14 19 Speech and Language Processing Jurafsky and Martin 20 3 Formalisms Recall that I said that regular expressions describe languages sets of strings Turns out that there are 3 formalisms for capturing such languages each with their own motivation and history Regular expressions Compact textual strings Perfect for specifying patterns in programs or command lines Finite state automata Graphs Regular grammars Rules 01 14 19 Speech and Language Processing Jurafsky and Martin 21 3 Formalisms These three approaches are all equivalent in terms of their ability to capture regular languages But as we ll see they do inspire
View Full Document
Unlocking...