DOC PREVIEW
Columbia COMS W4705 - Word Classes and Part-of-Speech Tagging

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Word Classes&Part-of-Speech TaggingMartin Janscherecycling Prof. Hirschberg’s slidesCS 47052004-10-04What is a word class?• Words that somehow ‘behave’ alike:– Appear in similar contexts,– Perform similar functions in sentences,– Undergo similar morphological changes.• Why do we want to identify them?– Pronunciation (poor person’s homograph disambiguation)– Stemming (poor person’s morphological analysis)– Semantics (poor persons’ word sense disambiguation)– Richer language models (back off to part of speech)– Parsing (almost-parsing, supertagging)2004-10-04 1In a nutshellA small number of categories from traditional grammar:• Nouns (agreement, humanism)• Verbs (agree, certify)• Adjectives (insidious, oily)• Adverbs (highly, seldom)• Prepositions (from)• Determiners (the, every)• Conjunctions (and, if, but)• Particles (up, off in certain contexts)2004-10-04 2Nouns• Appear in sim ilar c ontexts:– after determiners (the N, every N, no N)– modified by adjectives (an oily N)– modified by relative clauses (the N [I saw ye ste rday])• Perform similar functions in sentences:– noun phrases appear as arguments of predicates– free nominal modifiers (Friday, no way)• Undergo similar morphological changes:– plural inflection (agreement-s)– derivation (fish-y, noon-ish, item-ize, mis-adventure, e-card)2004-10-04 3Verbs• Appear in sim ilar c ontexts:– to V– modified by adverbs, prepositional phrases (to V warmly,to V in the shower)• Perform similar functions in sentences:– predicate of a clause– arguments in certain constructions (heard him V )• Undergo similar morphological changes:– inflection for subject agreement (agree-s)– past tense and past participle (agree-(e)d)– derivation (dis-agree, agree-ment, communicat(e)-ion)2004-10-04 4Adjectives• Appear in sim ilar c ontexts:– modifying nouns (ev ery Adj student)– modified by certain degree adverbs (a very Adj person)– after a form of be (this room is Adj)• Perform similar functions in sentences:– with be as a predicate– arguments in certain constructions (consider him V )• Undergo similar morphological changes:– inflection for degree (oil(y)i-er, oil(y)i-est)– derivation (un-clean, oil(y)i-ness, the poor/me ek/oily)2004-10-04 5Adverbs• Appear in sim ilar c ontexts:– modifying verbs (to Adv go where no-one has gone before)– modifying adjectives (a(n) Adv qualified applicant)– modifying adverbs (a(n) Adv highly qualified applicant)– modifying determiners (hardly any, almost all)– modifying prepositions (he drove Adv into a brick wall)• Perform similar functions in sentences:– predicate modifiers• Undergo similar morphological changes:– (not very systematic)2004-10-04 6Prepositions• Appear in sim ilar c ontexts:– before noun phrases (drove Prep the ocean)• Perform similar functions in sentences (when combined witha noun phrase):– as arguments of verbs (accuse somebody of something,charge somebody with something)– as optional modifiers indicating time, location, manner(eat lunch before/after/during the meeting)• Undergo similar morphological changes:– none (except for category blending, like nearest)2004-10-04 7Determiners• Appear in sim ilar c ontexts:– before nouns (plus nominal modifiers) (each/every/a/the/this/that/no honest businessm an, few/m ost/all/the/thesehonest businessmen)• Perform similar functions in sentences:– form noun phrases• Undergo similar morphological changes:– none2004-10-04 8Conjunctions• Appear in sim ilar c ontexts:– before sentences (Smith was worried Conj the governmentwas out to get him)• Perform similar functions in sentences (Conjunction +sentence):– arguments in certain constructions• Undergo similar morphological changes:– none2004-10-04 9ParticlesCatch-all category. Contains hard to classify items, e.g. inmulti-word verb forms (give up, cave in), or fixed constructions(the more you buy the more you save).2004-10-04 10But things are not as simple• Pronouns and proper names occur in (approximately) thesame contexts as noun phrases, hence need to be tagged likenoun phrases.• Most nouns usually require determiners, but some cannot(easily) be used with determiners (garlic).• All words can be nouns when quoted – no if s, ands or buts.• Verbs contain a closed subclass of so-called auxiliary verbs,which have idiosyncratic negations (aren’t, cannot), irregularparadigms, and can invert with their subjects.• Adverbs appear to be a heterogeneous category, since theycan modify verbs, adjectives, determiners etc.2004-10-04 11Sentence with part-of-speech tagsFrom the Brown corpus (Francis & Kucera, 1964–1979):The Fulton County Grand Jury said Friday an investigation ofAtlanta’s recent primary election produced no evidence thatany irregularities took place.With tags (slightly simplified):The/AT Fulton/NP County/NP Grand/NP Jury/NP said/VBDFriday/NP an/AT investigation/NN of/IN Atlanta/NP s/PPLrecent/JJ primary/JJ election/NN produced/VBD no/ATevidence/NN that/CS any/DTI irregularities/NNS took/VBDplace/NN ./.2004-10-04 12What gets tagged?• White space delimited strings?Need to decide how to tag grand in grand jury.• More abstract tokens?Need to decide how to tag ’s in Atlanta ’s.• Do not underestimate tokenization issues.2004-10-04 13What tags get assigned?• Tags should characterize the local syntactic function of aword in its context.• The Brown Corpus has 80+ tags• The Penn Treebank (PTB) has 40+ tags• Differences in tag inventories:– granularity– treatment of special words (to, no, there, . . . )– presence or absence of information about internal structureof a word2004-10-04 14Don’t mix categoriesAccording to the tagset of the Penn Treebank, words fall intothe following classes:• nouns, verbs, etc.,• those that are to,• foreign words.For example, perestroika is tagged as a foreign word, thoughit patterns with the proper nouns; laissez-fair is tagged as aforeign word, but it behaves like an adjective.The problem is that part-of-speech and foreignness areorthogonal dimensions.2004-10-04 15Make useful distinctionsThe Brown Corpus tagset distinguishes the verbs be, have, doin addition to the closed class of auxiliaries and the open classof non-auxiliary verbs. These distinctions are not useful, sincethey can easily be recovered.The Penn Treebank always tags to as TO, which doesn’t revealanything about whether it


View Full Document

Columbia COMS W4705 - Word Classes and Part-of-Speech Tagging

Download Word Classes and Part-of-Speech Tagging
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Word Classes and Part-of-Speech Tagging and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Word Classes and Part-of-Speech Tagging 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?