Overview• Natural language processinga• Introduction• Properties• Syntax• Semantics: Case; Conceptual dependencyaMaterial mostly drawn from Gordon Novak’s AI lectureshttp://www.cs.utexas.edu/users/novak/.1Natural Language ProcessingA form of communication.• Intentional exchange of information through the production andperception of signs.• Signs are drawn from a system of conventional signs.• Allows the use of information learned or observed by others.• Natural language is an example.* Study of signs is a complex discipline in its own right, calledsemiotics.2NLP and AINLP is a classical AI problem:• Minimal input data• Knowledge based• Reference to context• Local ambiguity• Global constraints (on interpretation)• Capturing the infinite: finite system for understanding an infiniteset of sentences3Areas of NLP• Text understanding• Speech recognition• Language generation (written or speech)• Machine translation (e.g. Babel Fish at Altavista).4Why Study NL?• Theoretical: (1) understand how language is structured; (2)understand the mental mechanisms necessary to supportlanguage use, e.g., memory.• Practical: (1) easier human-computer interaction; (2) machinetranslation (www, globalization, ...); (3) computer-computerinteraction (future).5Efficiency of Natural Language• Serial in nature: limited bandwidth.• Information theoretic concerns (bits per symbol).• Only say things that may not be known to the listener.• Zipf’s law: frequently used words are short!- Example: mom, dad, eat, ...• Often used long words tend to get abbreviated:- Fax, Cell, ASAP, PC, ...6Characteristics of NLP• Ambiguity: multiple interpretationsOne morning I shot an elephant in my pajamas.How he got in my pajamas I’ll never know.• Incompleteness: only a bare outline is givenI was late for work today. My car wouldn’t start. Thebattery was dead.7Major Challenges• Lexical ambiguity:- The pitcher broke his arm.- The pitcher broke.• Grammatical ambiguity:- I saw the man on the hill with the telescope.• Anaphora: words that refer to others.- John loaned bill his bike.• Semantics: understanding the meaning.- Need a vast amount of world knowledge.8Approaches• Formal approaches: parsing, reasoning, etc.• Statistical approaches: data mining, text mining, usage ofsurrounding context (word cooccurrence statistics: N-grams).9Speech Act and Understanding• Speech act: actions that allow production of language.• Examples: query, inform, request, acknowledge, promise, etc.• Characteristics: informative, declarative, etc.• Communicating agents’ task: to understand speech acts.10Fundamentals• Formal languages: strings of symbols (terminals).• Grammar (Syntax): finite set of rules that specifies a language(legal ways of ordering the string).• Semantics: meaning of the string of terminals.• Pragmatics: the meaning of the string within the context it iscurrently being used (need knowledge of the world and the socialcontext of language).(part)11Grammar• Phrase structure: phrases are substrings, and can come indifferent categories.• Phrase categories:– Sentence (S)– Noun phrase (NP)– Verb phrase (VP)– Prepositional phrase (PP)– ...• Terminal vs. nonterminal: words are terminals (leaves), andsymbols S, NP, VP, etc. are nonterminals (internal nodes in aparse tree).• Rewrite rule: < S >→< N P >< V P >12Languages: Generative CapacityChomsky’s four classes of grammatical formalisms:• Regular grammar:< N onT erm >→ T erm < N onterm > (Equivalent tofinite state machines.)• Context-free grammar: < N onT erm >→ ... (Equivalent topush-down automata.)• Context-sensitive grammar: symbols on the left hand side ≤symbols on the right hand side.• Recursively enumerable: both sides of the rewrite rule can haveany number of terminal/nonterminal. (Equivalent to Turingmachines.)13Steps in Communication• Intention: thought• Generation: form sentence to utter• Synthesis: utter the sentence• Perception: hear the utterance• Analysis:– Parse– Semantic interpretation: infer meaning of the parse tree– Pragmatic interpretation: infer meaning in reference to thecurrent context.• Disambiguation (this or that) and Incorporation (believe it or not)14Parsing: Grammar<SYMBOL>: nonterminalWORD: terminal.<S> --> <NP> <VP><NP> --> <ART> <ADJ> <NOUN><NP> --> <ART> <NOUN><NP> --> <ART> <NOUN> <PP><VP> --> <VERB> <NP><VP> --> <VERB> <NP> <PP><PP> --> <PREP> <NP><ART> --> A | AN | THE<NOUN> --> BOY | DOG | LEG | PORCH<ADJ> --> BIG<VERB> --> BIT<PREP> --> ON15Language Generation• Start with the sentence symbol, <S>.• Repeat until no nonterminal symbols remain: (1) Choose anonterminal symbol in the current string; (2) Choose a productionthat begins with that nonterminal; (3) Replace the nonterminal bythe right-hand side of the production.< S>< NP> < VP>< ART> < NOUN> < VP>THE < NOUN> < VP>THE DOG < VP>THE DOG < VERB> < NP>THE DOG < VERB> < ART> < NOUN>THE DOG < VERB> THE < NOUN>THE DOG BIT THE < NOUN>THE DOG BIT THE BOY16ParsingInverse of generation.17Parsing Techniques• Top-down: Start with the symbol < S > and hope to producethe string. (Very inefficient.)• Bottom-up: Reduce phrases using production rules.• Chart parser: Eliminates the redundant work in rephrasing bysaving partial results in a chart.• Augmented transition networks: (1) arbitrary tests added to arcs;(2) structure-building actions added to arcs (save state, etc.); (3)phrase names on arcs (can name subroutines). (Equivalent to aTuring machine.)18Problem: Ambiguity• Different parse trees can be generated depending on the orderyou picked the rewrite rules.• Lexical ambiguity and grammatical ambiguity.19Foreign LanguagesGrammars are quite different from English:• Word ordering• Number and gender agreement• Tense• Familiar, formal, honorific formsNumber of languages: several thousand (Native American: > 1,000;Africa∼ 1,000; New Guinea ∼ 700; India > 150; Russia ∼ 100, etc.20Semantics• Selecting correct word sense meanings.• Removing ambiguity: choosing interpretations that “make sense”when many interpretations are syntactically possible.- John saw my dog driving to work this morning.• Resolving pronoun references.- Bill wanted John’s bike. He stole it.• Resolving other references.- ... a ladder ... A man is 10 ft from the top.- A bridge is supported at each end.21Case TheoryFillmore [Charles Fillmore, The case for case,
View Full Document