DOC PREVIEW
UT CS 378 - Introduction to NLP Tools

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 341Introduction to NLP Tools09/23/20032Motivation•Machine Translation–From English to French•What’s needed?3Motivation Cont’d (1)•Syntactic parser•Part-Of-Speech Tagger–Example: NP -> adj noun•Morphological Analyzer–Example: “tools” -> “tool” “Who is he?” -> “Who is he ?”•Semantic Analyzer –Word sense disambiguate (“wash dishes”)–Choose the correct translation4Motivation Cont’d (2)•Lexicons–The information of the wordHow many senses? What’s the possible translationsof the word? •Corpus–Useful for learning a tool–Useful for evaluation5Outline•Lexicons•Text corpora•Morphological tools•Part-Of-Speech(POS) taggers•Syntactic parsers•Semantic knowledge bases and semantic parser•Speech tools6Lexicons•Definition–A repository for words•Lexicons in LDC(Linguistic Data Consortium)–creating and sharing linguistic resources: data, tools and standards. •CELEX•WordNet7CELEX•Dutch Center for Lexical Information•Lexical databases of English , Dutch and German•21,000 nouns, 8,000 adjectives and 6,000 verbs•English:–English Orthography, Lemmas–English Phonology, Lemmas–English Morphology, Lemmas–English Syntax, Lemmas–English Frequency, Lemmas–English Orthography, Wordforms–English Phonology, Wordforms–English Morphology, Wordforms–English Frequency, Wordforms–English Corpus Types–English Frequency, Syllables8WordNet•A database of lexical relations•Inspired by current psycholinguistic theories of human lexical memory•Synset: a set of synonyms, representing one underlying lexical concept–Example: •fool {chump, fish, fool, gull, mark, patsy, fall guy, sucker, schlemiel, shlemiel, soft touch, mug}•Relations link the synsets: hypernym, Has-Member, Member-Of, Antonym, etc.9WordNet Cont’d•Examplepu-erh.cs.utexas.edu$ wn bike -partnPart Meronyms of noun bike2 senses of bike Sense 1motorcycle, bike HAS PART: mudguard, splashguardSense 2bicycle, bike, wheel HAS PART: bicycle seat, saddle HAS PART: bicycle wheel HAS PART: chain HAS PART: coaster brake HAS PART: handlebar HAS PART: mudguard, splashguard HAS PART: pedal, treadle, foot lever HAS PART: sprocket, sprocket wheel•ExamplePu-erh.cs.utexas.edu$wn bikeInformation available for noun bike -hypen Hypernyms -hypon, -treen Hyponyms & Hyponym Tree -synsn Synonyms (ordered by frequency) -partn Has Part Meronyms -meron All Meronyms -famln Familiarity & Polysemy Count -coorn Coordinate Sisters -simsn Synonyms (grouped by similarity of meaning) -hmern Hierarchical Meronyms -grepn List of Compound Words -over Overview of SensesInformation available for verb bike -hypev Hypernyms -hypov, -treev Hyponyms & Hyponym Tree -synsv Synonyms (ordered by frequency) -famlv Familiarity & Polysemy Count -framv Verb Frames -simsv Synonyms (grouped by similarity of meaning) -grepv List of Compound Words -over Overview of Senses10Corpus•Definition–Collections of text and speech•LDC•Penn Treebank•DSO•Hansard11Some of the Top Corpus from LDC•TIPSTER –Information Retrieval, Data Extrraction datasets–TIPSTER project, TREC project•TIMIT Acoustic-Phonetic Continuous Speech Corpus–A corpus of read speech designed to –Provide speech data for the acquisition of acousticphonetic knowledge –Useful for the development and evaluation of automatic speech recognition systems•ECI(European Corpus Initiative Multilingual Corpus) multilingual electronic text corpus•NTIMIT–A phonetically–balanced, continuous speech, telephone bandwidth speech database12Penn Treebank•A collection of corpora•Tagged with POS, Syntactic roles, predicate/argument structure, dysfluency annotation•How are they made–Hand correction of the output of an errorful automatic process•3 million words–1 million words tagged with predicate/argument structure for extraction semantic knowledge13Penn Treebank Cont.’d•Corpora–Wall Street Journal –ATIS (Air Travel Information System)–Brown Corpus–IBM Manual Sentences–Library of America Texts: Mark Twain, Henry Adams, Herman Melville ...–MUC-3 Messages•Example:( (S (NP-SBJ Rally 's) (VP operates and franchises (NP (NP (QP about 160) fast-food restaurants) (PP-LOC throughout (NP the U.S))))Seeking/VBG to/TO block/VB[ the/DT investors/NNS ]from/IN buying/VBG[ more/JJR shares/NNS ]./.14DSO•Word Sense Corpus–Contains sentences in which about 192,800 word occurrences have been tagged with WordNet senses–Taken from the Brown corpus and the Wall Street Journal corpus–121 nouns and 70 verbs15Hansard•Official records (Hansards) of the 36th Canadian Parliament, both in English of French•1.3 million pairs of aligned sentences of English and French–Example•Comme il est 14 h 30, la Chambre s'ajourne jusqu'\xe0 lundi prochain, \xe0 11 heures, conform\xe9ment au paragraphe 24(1) du R\xe8glement.•It being 2.30 p.m., the House stands adjourned until Monday next at 11 a.m., pursuant to Standing Order 24(1).•Useful for Machine Translation16Morphological Tools•PC-KIMMO–A two-level morphological parser•Porter Stemmer•Penn Treebank Tokenizer–Seperate document into words–“dog?” -> “dog ?”17Porter Stemmer•Simple algorithm, use a set of cascaded rewrite rules–Example•Ational->ATE (relational->relate)•Stem:–The main morpheme of the word, supplying the main meaning•Fast•Used very widely in Information Retrieval–Run stemmer on keywords and the words in the documents18Part-Of-Speech(POS) Taggers•Part-Of-Speech: noun, verb, pronoun, etc.•Brill’s Tagger•HMM Tagger•MXPOST19Brill’s Tagger•Transformation-Based Learning(TBL) tagger•/projects/nlp/brill-pos-tagger•First labels every word with its most-likely tag•Then Use Learned TBL Rules to correct mistakes–Example:•Change NN to VB when the previous tag is


View Full Document

UT CS 378 - Introduction to NLP Tools

Documents in this Course
Epidemics

Epidemics

31 pages

Discourse

Discourse

13 pages

Phishing

Phishing

49 pages

Load more
Download Introduction to NLP Tools
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to NLP Tools and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to NLP Tools 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?