CSC 9010- Natural Language ProcessingNatural Language ProcessingApplied NLPNatural Language UnderstandingSlide 5Where are the words?Dissecting words/sentencesWhat does it mean?Slide 9Human LanguagesHuman Spoken languageLet’s talk!Controversial questions concerning human languageWhy Language is HardWhy Language is EasyWhat will it take?History of NLPSlide 18Slide 19CSC 9010- Natural Language ProcessingPaula Matuszek and Mary-Angela PapalaskariVillanova UniversitySpring 2005CSC 9010- Natural Language Processing - Introduction2Natural Language Processing•speech recognition•natural language understanding•computational linguistics•psycholinguistics•information extraction•information retrieval•inference•natural language generation•speech synthesis•language evolutionCSC 9010- Natural Language Processing - Introduction3Applied NLP•Machine translation•spelling/grammar correction•Information Retrieval•Data mining•Document classification•Question answering, conversational agentsCSC 9010- Natural Language Processing - Introduction4Natural Language Understanding accoustic /phoneticmorphological/syntacticsemantic / pragmaticsound wavessound wavesinternal internal representationrepresentationCSC 9010- Natural Language Processing - Introduction5SoundsSymbolsSenseNatural Language Understandingaccoustic /phoneticmorphological/syntacticsemantic / pragmatic sound wavessound wavesinternal internal representationrepresentationCSC 9010- Natural Language Processing - Introduction6•“How to recognize speech, not to wreck a nice beach”•“The cat scares all the birds away”•“The cat’s cares are few”Where are the words? sound wavessound wavesinternal internal representationrepresentationaccoustic /phoneticmorphological/syntacticsemantic / pragmatic- pauses in speech bear little relation to word breakspauses in speech bear little relation to word breaks+ intonation offers additional clues to meaning+ intonation offers additional clues to meaningCSC 9010- Natural Language Processing - Introduction7•“The dealer sold the merchant a dog”• “I saw the Golden bridge flying into San Francisco”• Word creation:establishestablishmentthe church of England as the official state church.disestablishmentantidisestablishmentantidisestablishmentarianantidisestablishmentarianismis a political philosophy that is opposed to the separation of church and state. Dissecting words/sentences internal internal representationrepresentationaccoustic /phoneticmorphological/syntacticsemantic / pragmaticsound wavessound wavesCSC 9010- Natural Language Processing - Introduction8• “I saw Pathfinder on Mars with a telescope”• “Pathfinder photographed Mars”• “The Pathfinder photograph from Ford has arrived”• “When a Pathfinder fords a river it sometimes mars its paint job.”What does it mean? sound wavessound wavesinternal internal representationrepresentationaccoustic /phoneticmorphological/syntacticsemantic / pragmaticCSC 9010- Natural Language Processing - Introduction9What does it mean? sound wavessound wavesinternal internal representationrepresentationaccoustic /phoneticmorphological/syntacticsemantic / pragmatic• “Jack went to the store. HeHe found the milk in aisle 3. HeHe paid for itit and left.”•“Surcharge for white orders.”• “ Q: Did you read the report?A: I read Bob’s email.”CSC 9010- Natural Language Processing - Introduction10Human Languages•You know ~50,000 words of primary language, each with several meanings•six year old knows ~13000 words•First 16 years we learn 1 word every 90 min of waking time•Mental grammar generates sentences -virtually every sentence is novel•3 year olds already have 90% of grammar•~6000 human languages – none of them simple!Adapted from Martin Nowak 2000 – Evolutionary biology of language – Phil.Trans. Royal Society LondonCSC 9010- Natural Language Processing - Introduction11Human Spoken language•Most complicated mechanical motion of the human body–Movements must be accurate to within mm–synchronized within hundredths of a second•We can understand up to 50 phonemes/sec (normal speech 10-15ph/sec)–but if sound is repeated 20 times /sec we hear continuous buzz!•All aspects of language processing are involved and manage to keep apaceAdapted from Martin Nowak 2000 – Evolutionary biology of language – Phil.Trans. Royal Society LondonCSC 9010- Natural Language Processing - Introduction12Let’s talk!The Natural History Museum (UK)– picture libraryhttp://piclib.nhm.ac.uk/piclib/www/comp.php?img=87493&frm=med&search=homunculusThis model shows what a man's body would look like if each part grew in proportion to the area of the cortex of the brain concerned with its movement.CSC 9010- Natural Language Processing - Introduction13Controversial questions concerning human language•Language organ•Universal grammar•A single dramatic mutation or gradual adaptation?CSC 9010- Natural Language Processing - Introduction14Why Language is Hard•NLP is AI-complete•Abstract concepts are difficult to represent•LOTS of possible relationships among concepts•Many ways to represent similar concepts•Tens of hundreds or thousands of features/dimensionsCSC 9010- Natural Language Processing - Introduction15Why Language is Easy•Highly redundant•Many relatively crude methods provide fairly good resultsCSC 9010- Natural Language Processing - Introduction16What will it take?•models of computation (state machines)•formal grammars•knowledge representation•search algorithms•dynamic programming•logic•machine learning•probability theoryCSC 9010- Natural Language Processing - Introduction17History of NLP•Prehistory (1940s, 1950s)–automata theory, formal language theory, markov processes (Turing, McCullock&Pitts, Chomsky)–information theory and probabilistic algorithms (Shannon)–Turing test – can machines think?•Early work:–symbolic approach•generative syntax - eg Transformations and Discourse Analysis Project (TDAP- Harris)•AI – pattern matching, logic-based, special-purpose systems–Eliza Rogerian therapist http://www.manifestation.com/neurotoys/eliza.php3–stochastic•baysian methodsearly successes $$$$ grants!by 1966 US government had spent 20 million on machine translation aloneCritics:–Bar Hillel – “no way to disambiguation without deep understanding”–Pierce NSF 1966 report: “no way to justify work in terms of
View Full Document