Villanova CSC 9010 - Part of Speech (POS) Tagging

Unformatted text preview:

Part of Speech (POS) TaggingSources (and Resources)Word Classes and Part-of-Speech TaggingParts of SpeechPOS examplesDefinition of POS TaggingPOS Tagging exampleWhat does Tagging do?Significance of Parts of SpeechWord ClassesOpen and closed class wordsOpen Class WordsClosed Class WordsPrepositions from CELEXEnglish Single-Word ParticlesPronouns in CELEXConjunctionsAuxiliariesPOS Tagging: Choosing a TagsetSome of the best-known TagsetsThe Brown CorpusPenn TreebankTag Set Example: Penn TreebankExample of Penn Treebank Tagging of Brown Corpus SentencePOS TaggingWord Class Ambiguity (in the Brown Corpus)Part-of-Speech TaggingRule-Based TaggingStart With a DictionaryAssign All Possible TagsWrite rules to eliminate tagsSample ENGTWOL LexiconStochastic TaggingStochastic Tagging (cont.)HMM TaggerConditional ProbabilityConditional Probabilities cont.Bayes' TheoremProbabilitiesSlide 40Tag Sequence: P(T)N-GramsP(T): Bigram ExampleCountsWhat about P(W|T)P(W|T)So…HMMsAn ExamplePerformanceTransformation-Based Tagging (Brill Tagging)Transformation-Based Tagging (cont.)TBL Rule ApplicationTBL: The Rule-Learning AlgorithmTBL: Rule Learning (cont.)Templates for TBLTBL: ProblemsTagging Unknown WordsEvaluating performanceTest setSo What's "Good"?Training and test setsComputing % correctTraining and Test setsCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari1Part of Speech (POS) TaggingCSC 9010: Special Topics. Natural Language Processing.Paula Matuszek, Mary-Angela PapalaskariSpring, 2005CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari2Sources (and Resources)•Some slides adapted from –Dorr, www.umiacs.umd.edu/~christof/courses/cmsc723-fall04 –Jurafsky, www.stanford.edu/class/linguist238–McCoy, www.cis.udel.edu/~mccoy/courses/cisc882.03f•With some additional examples and ideas from–Martin: www.cs.colorado.edu/~martin/csci5832.html–Hearst: www.sims.berkeley.edu/courses/is290-2/f04/resources.html–Litman: www.cs.pitt.edu/~litman/courses/cs2731f03/cs2731.html–Rich: www.cs.utexas.edu/users/ear/cs378NLP•You may find some or all of these useful resources throughout the course.CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari3Word Classes and Part-of-Speech Tagging•What is POS tagging?•Why do we need POS?•Word Classes•Rule-based Tagging•Stochastic Tagging•Transformation-Based Tagging•Tagging Unknown Words•Evaluating POS TaggersCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari4Parts of Speech•8 traditional parts of speech (more or less)–Noun, verb, adjective, preposition, adverb, article, pronoun, conjunction. –This idea has been around for over 2000 years (Dionysius Thrax of Alexandria, c. 100 B.C.)–Called: parts-of-speech, lexical category, word classes, morphological classes, lexical tags, POS–Actual categories vary by language , by reason for tagging, by who you ask!CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari5POS examples•N noun chair, bandwidth, pacing•V verb study, debate, munch•ADJ adj purple, tall, ridiculous•ADV adverb unfortunately, slowly,•P preposition of, by, to•PRO pronoun I, me, mine•DET determiner the, a, that, thoseCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari6Definition of POS Tagging“The process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin)thegirlkissedtheboyonthecheekWOR DSTAGSNVPDETCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari7POS Tagging exampleWORD tagthe DETkoala Nput Vthe DETkeys Non Pthe DETtable NModified from Diane Litman's version of Steve Bird's notesCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari8What does Tagging do?1. Collapses Distinctions•Lexical identity may be discarded•e.g. all personal pronouns tagged with PRP2. Introduces Distinctions•Ambiguities may be removed•e.g. deal tagged with NN or VB•e.g. deal tagged with DEAL1 or DEAL23. Helps classification and predictionModified from Diane Litman's version of Steve Bird's notesCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari9Significance of Parts of Speech•A word’s POS tells us a lot about the word and its neighbors:–Limits the range of meanings (deal), pronunciation (object vs object) or both (wind)–Helps in stemming–Limits the range of following words for Speech Recognition–Can help select nouns from a document for IR–Basis for partial parsing (chunked parsing)–Parsers can build trees directly on the POS tags instead of maintaining a lexiconCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari10Word Classes•What are we trying to classify words into?•Classes based on–Syntactic properties. What can precede/follow.–Morphological properties. What affixes they take.–Not primarily by semantic coherence (Conjunction Junction notwithstanding!)•Broad "grammar" categories are familiar•NLP uses much richer "tagsets"CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari11Open and closed class words•Two major categories of classes:–Closed class: a relatively fixed membership •Prepositions: of, in, by, …•Auxiliaries: may, can, will had, been, …•Pronouns: I, you, she, mine, his, them, …•Usually function words (short common words which play a role in grammar)–Open class: new ones can be created all the time•English has 4: Nouns, Verbs, Adjectives, Adverbs•Many languages have all 4, but not all!CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari12Open Class Words•Every known human language has nouns and verbs•Nouns: people, places, things–Classes of nouns•proper vs. common•count vs. mass•Verbs: actions and processes•Adjectives: properties, qualities•Adverbs: hodgepodge!–Unfortunately, John walked home extremely slowly yesterdayCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari13Closed Class Words•Idiosyncratic. Differ more from language to language.•Language strongly resists additions•Examples:–prepositions:


View Full Document

Villanova CSC 9010 - Part of Speech (POS) Tagging

Documents in this Course
Lecture 2

Lecture 2

48 pages

Lecture 2

Lecture 2

46 pages

Load more
Download Part of Speech (POS) Tagging
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Part of Speech (POS) Tagging and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Part of Speech (POS) Tagging 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?