DOC PREVIEW
TAMU CSCE 420 - slide13

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Overview• Natural language processinga• Introduction• Properties• Syntax• Semantics: Case; Conceptual dependencyaMaterial mostly drawn from Gordon Novak’s AI lectureshttp://www.cs.utexas.edu/users/novak/.1Natural Language ProcessingA form of communication.• Intentional exchange of information through the production andperception of signs.• Signs are drawn from a system of conventional signs.• Allows the use of information learned or observed by others.• Natural language is an example.* Study of signs is a complex discipline in its own right, calledsemiotics.2NLP and AINLP is a classical AI problem:• Minimal input data• Knowledge based• Reference to context• Local ambiguity• Global constraints (on interpretation)• Capturing the infinite: finite system for understanding an infiniteset of sentences3Areas of NLP• Text understanding• Speech recognition• Language generation (written or speech)• Machine translation (e.g. Babel Fish at Altavista).4Why Study NL?• Theoretical: (1) understand how language is structured; (2)understand the mental mechanisms necessary to supportlanguage use, e.g., memory.• Practical: (1) easier human-computer interaction; (2) machinetranslation (www, globalization, ...); (3) computer-computerinteraction (future).5Efficiency of Natural Language• Serial in nature: limited bandwidth.• Information theoretic concerns (bits per symbol).• Only say things that may not be known to the listener.• Zipf’s law: frequently used words are short!- Example: mom, dad, eat, ...• Often used long words tend to get abbreviated:- Fax, Cell, ASAP, PC, ...6Characteristics of NLP• Ambiguity: multiple interpretationsOne morning I shot an elephant in my pajamas.How he got in my pajamas I’ll never know.• Incompleteness: only a bare outline is givenI was late for work today. My car wouldn’t start. Thebattery was dead.7Major Challenges• Lexical ambiguity:- The pitcher broke his arm.- The pitcher broke.• Grammatical ambiguity:- I saw the man on the hill with the telescope.• Anaphora: words that refer to others.- John loaned bill his bike.• Semantics: understanding the meaning.- Need a vast amount of world knowledge.8Approaches• Formal approaches: parsing, reasoning, etc.• Statistical approaches: data mining, text mining, usage ofsurrounding context (word cooccurrence statistics: N-grams).9Speech Act and Understanding• Speech act: actions that allow production of language.• Examples: query, inform, request, acknowledge, promise, etc.• Characteristics: informative, declarative, etc.• Communicating agents’ task: to understand speech acts.10Fundamentals• Formal languages: strings of symbols (terminals).• Grammar (Syntax): finite set of rules that specifies a language(legal ways of ordering the string).• Semantics: meaning of the string of terminals.• Pragmatics: the meaning of the string within the context it iscurrently being used (need knowledge of the world and the socialcontext of language).(part)11Grammar• Phrase structure: phrases are substrings, and can come indifferent categories.• Phrase categories:– Sentence (S)– Noun phrase (NP)– Verb phrase (VP)– Prepositional phrase (PP)– ...• Terminal vs. nonterminal: words are terminals (leaves), andsymbols S, NP, VP, etc. are nonterminals (internal nodes in aparse tree).• Rewrite rule: < S >→< N P >< V P >12Languages: Generative CapacityChomsky’s four classes of grammatical formalisms:• Regular grammar:< N onT erm >→ T erm < N onterm > (Equivalent tofinite state machines.)• Context-free grammar: < N onT erm >→ ... (Equivalent topush-down automata.)• Context-sensitive grammar: symbols on the left hand side ≤symbols on the right hand side.• Recursively enumerable: both sides of the rewrite rule can haveany number of terminal/nonterminal. (Equivalent to Turingmachines.)13Steps in Communication• Intention: thought• Generation: form sentence to utter• Synthesis: utter the sentence• Perception: hear the utterance• Analysis:– Parse– Semantic interpretation: infer meaning of the parse tree– Pragmatic interpretation: infer meaning in reference to thecurrent context.• Disambiguation (this or that) and Incorporation (believe it or not)14Parsing: Grammar<SYMBOL>: nonterminalWORD: terminal.<S> --> <NP> <VP><NP> --> <ART> <ADJ> <NOUN><NP> --> <ART> <NOUN><NP> --> <ART> <NOUN> <PP><VP> --> <VERB> <NP><VP> --> <VERB> <NP> <PP><PP> --> <PREP> <NP><ART> --> A | AN | THE<NOUN> --> BOY | DOG | LEG | PORCH<ADJ> --> BIG<VERB> --> BIT<PREP> --> ON15Language Generation• Start with the sentence symbol, <S>.• Repeat until no nonterminal symbols remain: (1) Choose anonterminal symbol in the current string; (2) Choose a productionthat begins with that nonterminal; (3) Replace the nonterminal bythe right-hand side of the production.< S>< NP> < VP>< ART> < NOUN> < VP>THE < NOUN> < VP>THE DOG < VP>THE DOG < VERB> < NP>THE DOG < VERB> < ART> < NOUN>THE DOG < VERB> THE < NOUN>THE DOG BIT THE < NOUN>THE DOG BIT THE BOY16ParsingInverse of generation.17Parsing Techniques• Top-down: Start with the symbol < S > and hope to producethe string. (Very inefficient.)• Bottom-up: Reduce phrases using production rules.• Chart parser: Eliminates the redundant work in rephrasing bysaving partial results in a chart.• Augmented transition networks: (1) arbitrary tests added to arcs;(2) structure-building actions added to arcs (save state, etc.); (3)phrase names on arcs (can name subroutines). (Equivalent to aTuring machine.)18Problem: Ambiguity• Different parse trees can be generated depending on the orderyou picked the rewrite rules.• Lexical ambiguity and grammatical ambiguity.19Foreign LanguagesGrammars are quite different from English:• Word ordering• Number and gender agreement• Tense• Familiar, formal, honorific formsNumber of languages: several thousand (Native American: > 1,000;Africa∼ 1,000; New Guinea ∼ 700; India > 150; Russia ∼ 100, etc.20Semantics• Selecting correct word sense meanings.• Removing ambiguity: choosing interpretations that “make sense”when many interpretations are syntactically possible.- John saw my dog driving to work this morning.• Resolving pronoun references.- Bill wanted John’s bike. He stole it.• Resolving other references.- ... a ladder ... A man is 10 ft from the top.- A bridge is supported at each end.21Case TheoryFillmore [Charles Fillmore, The case for case,


View Full Document

TAMU CSCE 420 - slide13

Documents in this Course
Load more
Download slide13
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view slide13 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view slide13 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?