DOC PREVIEW
Columbia COMS W4705 - Part of Speech Tagging

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS4705 Part of Speech taggingHW questions?SmoothingSmoothing is like Robin Hood: Steal from the rich and give to the poor (in probability mass)Smoothing MethodsGarden path sentencesWhat is a word class?POS examplesPOS Tagging: DefinitionWhat is POS tagging good for?Open and closed class wordsOpen class wordsSlide Number 13How do we decide which words go in which classes?Closed Class WordsPOS tagging: Choosing a tagsetPenn TreeBank POS Tag setUsing the UPenn tagsetPOS TaggingHow do we assign POS tags to words in a sentence? How hard is POS tagging? Measuring ambiguityPotential Sources of Disambiguation3 methods for POS taggingRule-based taggingStart with a dictionaryUse the dictionary to assign every possible tagWrite rules to eliminate tagsSample ENGTWOL LexiconStage 1 of ENGTWOL TaggingStage 2 of ENGTWOL TaggingTransformation-Based Tagging (Brill Tagging)Transformation-Based Tagging TBL Rule ApplicationTBL: Rule LearningTBL: The Tagging AlgorithmTBL: Rule Learning (cont.)Templates for TBLComparison of two approachesSummary9/17/2009 1Some slides adapted from: Dan Jurafsky, Julia Hirschberg, Jim Martin` Training files, question samples◦ /home/cs4705/corpora/wsj◦ /home/cs4705/corpora/wsj/wsj_2300questions.txt◦ CVN: will post on the CVN web site this afternoon` Question and answer templates` Not expected to use tools that we haven’t gone over (e.g., named entity recognition)x Must allow paraphrases for indicesx But company names will be provided in question exactly as they appear in article` Any other questions?2` Words follow a Zipfian distribution◦ Small number of words occur very frequently◦ A large number are seen only once◦ Zipf’s law: a word’s frequency is approximately inversely proportional to its rank in the word distribution list` Zero probabilities on one bigram cause a zero probability on the entire sentence3Slide from Dan Klein` Add-one smoothing (easy, but inaccurate)◦ Add 1 to every word count (Note: this is type)◦ Increment normalization factor by Vocabulary size: N (tokens) + V (types) :` Backoff models◦ When a count for an n-gram is 0, back off to the count for the (n-1)-gram◦ These can be weighted` Class-based smoothing◦ For certain types of n-grams, back off to the count of its syntactic class◦ E.g., Count ProperNouns in place of names (e.g., Obama)` Good-Turing◦ Re-estimate amount of probability mass for zero (or low count) ngrams by looking at ngrams with higher counts◦ Estimate VNcipi++=1*()NcNccc11*++=` The old dog the footsteps of the young.` The cotton clothing is made of grows in Mississippi.` The horse raced past the barn fell.6` Words that somehow ‘behave’ alike:◦ Appear in similar contexts◦ Perform similar functions in sentences◦ Undergo similar transformations` 9 (or so) traditional parts of speech◦ Noun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, 7` Nnounchair, bandwidth, pacing` Vverbstudy, debate, munch` ADJ adjective purple, tall, ridiculous` ADV adverb unfortunately, slowly,` P preposition of, by, to` PRO pronoun I, me, mine` DET determinerthe, a, that, those8` The process of assigning a part-of-speech or lexical class marker to each word in a corpus:9thekoalaputthekeysonthetableWORDSTAGSNVPDET` Is the first step of a vast number of Comp Ling tasks` Speech synthesis:◦ How to pronounce “lead”?◦ INsult inSULT◦ OBject obJECT◦ OVERflow overFLOW◦ DIScount disCOUNT◦ CONtent conTENT` Parsing◦ Need to know if a word is an N or V before you can parse` Word prediction in speech recognition ◦ Possessive pronouns (my, your, her) followed by nouns◦ Personal pronouns (I, you, he) likely to be followed by verbs` Machine Translation10` Closed class: a relatively fixed membership ◦ Prepositions: of, in, by, …◦ Auxiliaries: may, can, will had, been, …◦ Pronouns: I, you, she, mine, his, them, …◦ Usually function words (short common words which play a role in grammar)` Open class: new ones can be created all the time◦ English has 4: Nouns, Verbs, Adjectives, Adverbs◦ Many languages have all 4, but not all!◦ In Lakhota and possibly Chinese, what English treats as adjectives act more like verbs.11` Nouns◦ Proper nouns (Columbia University, New York City, Arthi Ramachandran, Metropolitan Transit Center). English capitalizes these.◦ Common nouns (the rest). German capitalizes these.◦ Count nouns and mass nounsx Count: have plurals, get counted: goat/goats, one goat, two goatsx Mass: don’t get counted (fish, salt, communism) (*two fishes)` Adverbs: tend to modify things◦ Unfortunately, John walked home extremely slowly yesterday◦ Directional/locative adverbs (here, home, downhill)◦ Degree adverbs (extremely, very, somewhat)◦ Manner adverbs (slowly, slinkily, delicately)` Verbs:◦ In English, have morphological affixes (eat/eats/eaten)◦ Actions (walk, ate) and states (be, exude)12` Many subclasses, e.g.◦ eats/V ⇒ eat/VB, eat/VBP, eats/VBZ, ate/VBD, eaten/VBN, eating/VBG, ...◦ Reflect morphological form & syntactic function13` Nouns denote people, places and things and can be preceded by articles? But…My typing is very bad.*The Mary loves John.` Verbs are used to refer to actions, processes, states◦ But some are closed class and some are openI will have emailed everyone by noon.• Adverbs modify actions◦ Is Monday a temporal adverb or a noun? Some others?14` Idiosyncratic` Closed class words (Prep, Det, Pron, Conj, Aux, Part, Num) are easier, since we can enumerate them….but◦ Part vs. Prepx George eats up his dinner/George eats his dinner up.x George eats up the street/*George eats the street up.◦ Articles come in 2 flavors: definite (the) and indefinite (a, an)15` To do POS tagging, need to choose a standard set of tags to work with` Could pick very coarse tagsets◦ N, V, Adj, Adv.` Brown Corpus (Francis & Kucera ‘82), 1M words, 87 tags` Penn Treebank: hand-annotated corpus of Wall Street Journal, 1M words, 45-46 tags◦ Commonly used ◦ set is finer grained,` Even more fine-grained tagsets exist1617` The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.` Prepositions and subordinating conjunctions marked IN (“although/IN I/PRP..”)` Except the preposition/complementizer “to” is just marked “to”.18` Words often have more than one POS: back◦ The backdoor = JJ◦ On my back= NN◦ Win the voters back= RB◦ Promised to backthe bill = VB` The POS tagging


View Full Document

Columbia COMS W4705 - Part of Speech Tagging

Download Part of Speech Tagging
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Part of Speech Tagging and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Part of Speech Tagging 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?