Unformatted text preview:

Part-of-Speech TaggingCMSC 723: Computational Linguistics I ― Session #4Jimmy LinJimmy LinThe iSchoolUniversity of MarylandWednesday, September 23, 2009Source: Calvin and HobbsToday’s Agenda| What are parts of speech (POS)?| What is POS tagging?at s OS tagg g| Methods for automatic POS taggingzRule-based POS taggingRulebased POS taggingz Transformation-based learning for POS tagging| Along the way…z Evaluationz Supervised machine learningParts of Speech| “Equivalence class” of linguistic entitiesz “Categories” or “types” of words| Study dates back to the ancient Greeksz Dionysius Thrax of Alexandria (c. 100 BC)z 8 parts of speech: noun, verb, pronoun, preposition, adverb, conjunction, participle, articlez Remarkably enduring list!4How do we define POS?| By meaningz Verbs are actionsz Adjectives are propertiesz Nouns are things|By the syntactic environment|By the syntactic environmentz What occurs nearby?z What does it act as?| By what morphological processes affect itz What affixes does it take?| Combination of the aboveParts of Speech| Open classz Impossible to completely enumeratez New words continuously being invented, borrowed, etc.| Closed classz Closed, fixed membershipz Reasonably easy to enumeratez Generally, short function words that “structure” sentencesOpen Class POS| Four major open classes in Englishz Nounsz Verbsz AdjectiveszAdverbsAdverbs| All languages have nouns and verbs... but may not have the other twoNouns| Open classz New inventions all the time: muggle, webinar, ...| Semantics:z Generally, words for people, places, thingsz But not always (bandwidth, energy, ...)| Syntactic environment:Occurring with determinerszOccurring with determinersz Pluralizable, possessivizable| Other characteristics:z Mass vs. count nounsVerbs| Open classz New inventions all the time: google, tweet, ...| Semantics:z Generally, denote actions, processes, etc.| Syntactic environment:z Intransitive, transitive, ditransitiveAlternationszAlternations| Other characteristics:zMain vs auxiliary verbszMain vs. auxiliary verbsz Gerunds (verbs behaving like nouns)z Participles (verbs behaving like adjectives)Adjectives and Adverbs| Adjectivesz Generally modify nouns, e.g., tall girl| Adverbsz A semantic and formal potpourri…z Sometimes modify verbs, e.g., sang beautifullyz Sometimes modify adjectives, e.g., extremely hotClosed Class POS| Prepositionsz In English, occurring before noun phrasesz Specifying some type of relation (spatial, temporal, …)z Examples: on the shelf, before noon|Particles|Particlesz Resembles a preposition, but used with a verb (“phrasal verbs”)z Examples: find out, turn over, go onParticle vs. PrepositionsHe camebytheofficeinahurry(by = preposition)He came bythe office in a hurryHe came by his fortune honestly(by = preposition)(by = particle)We ran up the phone billWe ran up the small hill(up = particle)(up = preposition)He lived down the blockHe never lived down the nicknames(down = preposition)(down = particle)More Closed Class POS| Determinersz Establish reference for a nounz Examples: a, an, the (articles), that, this, many, such, …| Pronounsz Refer to person or entities: he, she, itz Possessive pronouns: his, her, itsz Wh-pronouns: what, whoClosed Class POS: Conjunctions| Coordinating conjunctionsz Join two elements of “equal status”z Examples: cats and dogs, salad or soup| Subordinating conjunctionsz Join two elements of “unequal status”z Examples: We’ll leave after you finish eating. While I was waiting in line, I saw my friend.z Complementizers are a special case: I think that you should finish your assignmentLest you think it’s an Anglo-centric world,It’s time to visitIt s time to visit ......The (Linguistic)The (Linguistic) Twilight ZoneDigressionThe (Linguistic)Twilight ZonePerhaps not so strangeTurkishl t d kl dPerhaps, not so strange…uygarlaştıramadıklarımızdanmışsınızcasına →uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casınabehaving as if you are among those whom we could not cause to become civilizedChineseNo verb/adjective distinction!漂亮: beautiful/to be beautifulDigressionTzeltal(Mayan language spoken in Chiapas)The (Linguistic)Twilight ZoneTzeltal(Mayan language spoken in Chiapas)Only 3000 root forms in the vocabularyThe verb ‘EAT’ has eight variations:General : TUNBananas and soft stuff : LO’Bananas and soft stuff : LOBeans and crunchy stuff : K’UXTortillas and bread : WE’M t d Chili TI’Meat and Chilies : TI’Sugarcane : TZ’ULiquids : UCH’qDigressionRiau Indonesian/MalayThe (Linguistic)Twilight ZoneRiau Indonesian/MalayNo ArticlesNo Tense Marking3rd person pronouns neutral to both gender and numberNf t diti ihi bfNo features distinguishing verbs from nounsDigressionRiau Indonesian/MalayThe (Linguistic)Twilight ZoneAyam(chicken) Makan(eat)Riau Indonesian/Malayy()()The chicken is eatingThe chicken ateThe chicken will eatThe chicken is being eatenWhere the chicken is eatinggHow the chicken is eatingSomebody is eating the chickenThe chicken that is eatingThe chicken that is eatingBkt ll hdldBack to regularly scheduled programming…pg gPOS Tagging: What’s the task?| Process of assigning part-of-speech tags to words| But what tags are we going to assign?ut at tags a e e go g to ass gz Coarse grained: noun, verb, adjective, adverb, …z Fine grained: {proper, common} nounz Even finer-grained: {proper, common} noun ±animate| Important issues to rememberChoice of tags encodes certain distinctions/nondistinctionszChoice of tags encodes certain distinctions/non-distinctionsz Tagsets will differ across languages!| For English, Penn Treebank is the most common tagsetg,gPenn Treebank Tagset: 45 TagsPenn Treebank Tagset: Choices| Example:z The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.| Distinctions and non-distinctionsPrepositions and subordinating conjunctions are tagged “IN”zPrepositions and subordinating conjunctions are tagged “IN” (“Although/IN I/PRP..”)z Except the preposition/complementizer “to” is tagged “TO”Don’t think this is correct? Doesn’t make sense?Don t think this is correct? Doesn t make sense?Often, must suspend linguistic intuition and defer to the annotation guidelines!Why do POS tagging?| One of the most basic NLP tasksz Nicely illustrates principles of statistical NLP| Useful for higher-level analysisz Needed for syntactic analysisz Needed for semantic analysis| Sample applications


View Full Document

UMD CMSC 723 - Part-of-Speech Tagging

Documents in this Course
Lecture 9

Lecture 9

12 pages

Smoothing

Smoothing

15 pages

Load more
Download Part-of-Speech Tagging
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Part-of-Speech Tagging and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Part-of-Speech Tagging 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?