Unformatted text preview:

CSCI 5832 Natural Language Processing Lecture 10 Jim Martin 01 13 19 CSCI 5832 Spring 2007 1 Today 2 20 Review POS Tagging HMMs and Viterbi Break Syntax and Context free grammars 01 13 19 CSCI 5832 Spring 2007 2 Review Parts of Speech Basic syntactic morphological categories that words belong to Part of Speech tagging Assigning parts of speech to all the words in a sentence 01 13 19 CSCI 5832 Spring 2007 3 Probabilities We want the best set of tags for a sequence of words a sentence W is a sequence of words T is a sequence of tags arg max P T W P W T P T 01 13 19 CSCI 5832 Spring 2007 4 So We start with arg max P T W P W T P T And get n n i 2 i 2 arg max P wi ti P t1 P ti ti 1 01 13 19 CSCI 5832 Spring 2007 5 HMMs This is an HMM n n i 2 i 2 arg max P wi ti P t1 P ti ti 1 The states in the model are the tags and the observations are the words The state to state transitions are driven by the bigram statistics The observed words are based solely on the state that you re in 01 13 19 CSCI 5832 Spring 2007 6 State Transitions Noun Verb Det Aux 01 13 19 0 5 CSCI 5832 Spring 2007 7 State Transitions and Observations bark dog bark cat run Noun bite the a Verb Det that Aux can will 01 13 19 0 5 CSCI 5832 Spring 2007 did 8 The State Space Det Det Det Det Noun Noun Noun Noun s s 01 13 19 Aux Aux Aux Aux Verb Verb Verb Verb The dog can run CSCI 5832 Spring 2007 9 The State Space Det Det Det Det Noun Noun Noun Noun s s 01 13 19 Aux Aux Aux Aux Verb Verb Verb Verb The dog can run CSCI 5832 Spring 2007 10 The State Space Det Det Det Det Noun Noun Noun Noun s s 01 13 19 Aux Aux Aux Aux Verb Verb Verb Verb The dog can run CSCI 5832 Spring 2007 11 Viterbi Efficiently return the most likely path Sweep through the columns multiplying the probabilities of one row times the transition probabilities to the next row times the appropriate observation probabilities And store the MAX 01 13 19 CSCI 5832 Spring 2007 12 Viterbi 01 13 19 CSCI 5832 Spring 2007 13 Break Changing the schedule a bit We re going to move on to Chapters 11 and 12 starting today We ll then go back to cover relevant aspects of Chapter 6 Next quiz will cover 5 6 11 12 and 13 01 13 19 CSCI 5832 Spring 2007 14 Talks CS Colloquium Thursday 3 30 Fernando Pereira Penn Learning to Analyze Sequences Basically Chapter 6 on steroids ICS Colloquium Friday noon Christer Samuelsson A Computational Linguist on Wall Street How to use HMMs to do market prediction Using Chapter 6 to make gobs of money 01 13 19 CSCI 5832 Spring 2007 15 Syntax By syntax or grammar I mean the kind of implicit knowledge of your native language that you had mastered by the time you were 2 or 3 years old without explicit instruction Not the kind of stuff you were later taught in school 01 13 19 CSCI 5832 Spring 2007 16 Syntax Why should you care Grammar checkers Question answering Information extraction Machine translation 01 13 19 CSCI 5832 Spring 2007 17 Search On Friday PARC is announcing a deal that underscores that strategy It is licensing a broad portfolio of patents and technology to a well financed start up with an ambitious and potentially lucrative goal to build a search engine that could some day rival Google The start up Powerset is licensing PARC s natural language technology the art of making computers understand and process languages like English Powerset hopes the technology will be the basis of a new search engine that allows users to type queries in plain English rather than using keywords 01 13 19 CSCI 5832 Spring 2007 18 Search For a lot of things keyword search works well said Barney Pell chief executive of Powerset But I think we are going to look back in 10 years and say remember when we used to search using keywords 01 13 19 CSCI 5832 Spring 2007 19 Search In a November interview Marissa Mayer Google s vice president for search and user experience said Natural language is really hard I don t think it will happen in the next five years 01 13 19 CSCI 5832 Spring 2007 20 Search My general feeling about naturallanguage processing in search is that I m a bit of a skeptic in the sense that even the best systems and I include there the systems from PARC make many mistakes said Mr Pereira of the University of Pennsylvania 01 13 19 CSCI 5832 Spring 2007 21 Context Free Grammars Capture constituency and ordering Ordering is easy What are the rules that govern the ordering of words and bigger units in the language What s constituency How words group into units and how the various kinds of units behave 01 13 19 CSCI 5832 Spring 2007 22 CFG Examples S NP VP NP Det NOMINAL NOMINAL Noun VP Verb Det a Noun flight Verb left 01 13 19 CSCI 5832 Spring 2007 23 CFGs S NP VP This says that there are units called S NP and VP in this language That an S consists of an NP followed immediately by a VP Doesn t say that that s the only kind of S Nor does it say that this is the only place that NPs and VPs occur 01 13 19 CSCI 5832 Spring 2007 24 Generativity As with FSAs and FSTs you can view these rules as either analysis or synthesis machines Generate strings in the language Reject strings not in the language Impose structures trees on strings in the language 01 13 19 CSCI 5832 Spring 2007 25 Derivations A derivation is a sequence of rules applied to a string that accounts for that string Covers all the elements in the string Covers only the elements in the string 01 13 19 CSCI 5832 Spring 2007 26 Derivations as Trees 01 13 19 CSCI 5832 Spring 2007 27 Parsing Parsing is the process of taking a string and a grammar and returning a many parse tree s for that string It is completely analogous to running a finite state transducer with a tape It s just more powerful Remember this means that there are languages we can capture with CFGs that we can t capture with finite state methods 01 13 19 CSCI 5832 Spring 2007 28 Other Options Regular languages expressions Too weak Context sensitive or Turing equiv Too powerful maybe 01 13 19 CSCI 5832 Spring 2007 29 Context The notion of context in CFGs has nothing to do with the ordinary meaning of the word context in language All it really means is that the nonterminal …


View Full Document

CU-Boulder CSCI 5832 - Lecture 10

Loading Unlocking...
Login

Join to view Lecture 10 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 10 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?