Villanova CSC 9010 - Natural Language Processing

Unformatted text preview:

CSC 9010- Natural Language ProcessingNatural Language ProcessingApplied NLPNatural Language UnderstandingSlide 5Where are the words?Dissecting words/sentencesWhat does it mean?Slide 9Human LanguagesHuman Spoken languageLet’s talk!Controversial questions concerning human languageWhy Language is HardWhy Language is EasyWhat will it take?History of NLPSlide 18Slide 19CSC 9010- Natural Language ProcessingPaula Matuszek and Mary-Angela PapalaskariVillanova UniversitySpring 2005CSC 9010- Natural Language Processing - Introduction2Natural Language Processing•speech recognition•natural language understanding•computational linguistics•psycholinguistics•information extraction•information retrieval•inference•natural language generation•speech synthesis•language evolutionCSC 9010- Natural Language Processing - Introduction3Applied NLP•Machine translation•spelling/grammar correction•Information Retrieval•Data mining•Document classification•Question answering, conversational agentsCSC 9010- Natural Language Processing - Introduction4Natural Language Understanding accoustic /phoneticmorphological/syntacticsemantic / pragmaticsound wavessound wavesinternal internal representationrepresentationCSC 9010- Natural Language Processing - Introduction5SoundsSymbolsSenseNatural Language Understandingaccoustic /phoneticmorphological/syntacticsemantic / pragmatic sound wavessound wavesinternal internal representationrepresentationCSC 9010- Natural Language Processing - Introduction6•“How to recognize speech, not to wreck a nice beach”•“The cat scares all the birds away”•“The cat’s cares are few”Where are the words? sound wavessound wavesinternal internal representationrepresentationaccoustic /phoneticmorphological/syntacticsemantic / pragmatic- pauses in speech bear little relation to word breakspauses in speech bear little relation to word breaks+ intonation offers additional clues to meaning+ intonation offers additional clues to meaningCSC 9010- Natural Language Processing - Introduction7•“The dealer sold the merchant a dog”• “I saw the Golden bridge flying into San Francisco”• Word creation:establishestablishmentthe church of England as the official state church.disestablishmentantidisestablishmentantidisestablishmentarianantidisestablishmentarianismis a political philosophy that is opposed to the separation of church and state. Dissecting words/sentences internal internal representationrepresentationaccoustic /phoneticmorphological/syntacticsemantic / pragmaticsound wavessound wavesCSC 9010- Natural Language Processing - Introduction8• “I saw Pathfinder on Mars with a telescope”• “Pathfinder photographed Mars”• “The Pathfinder photograph from Ford has arrived”• “When a Pathfinder fords a river it sometimes mars its paint job.”What does it mean? sound wavessound wavesinternal internal representationrepresentationaccoustic /phoneticmorphological/syntacticsemantic / pragmaticCSC 9010- Natural Language Processing - Introduction9What does it mean? sound wavessound wavesinternal internal representationrepresentationaccoustic /phoneticmorphological/syntacticsemantic / pragmatic• “Jack went to the store. HeHe found the milk in aisle 3. HeHe paid for itit and left.”•“Surcharge for white orders.”• “ Q: Did you read the report?A: I read Bob’s email.”CSC 9010- Natural Language Processing - Introduction10Human Languages•You know ~50,000 words of primary language, each with several meanings•six year old knows ~13000 words•First 16 years we learn 1 word every 90 min of waking time•Mental grammar generates sentences -virtually every sentence is novel•3 year olds already have 90% of grammar•~6000 human languages – none of them simple!Adapted from Martin Nowak 2000 – Evolutionary biology of language – Phil.Trans. Royal Society LondonCSC 9010- Natural Language Processing - Introduction11Human Spoken language•Most complicated mechanical motion of the human body–Movements must be accurate to within mm–synchronized within hundredths of a second•We can understand up to 50 phonemes/sec (normal speech 10-15ph/sec)–but if sound is repeated 20 times /sec we hear continuous buzz!•All aspects of language processing are involved and manage to keep apaceAdapted from Martin Nowak 2000 – Evolutionary biology of language – Phil.Trans. Royal Society LondonCSC 9010- Natural Language Processing - Introduction12Let’s talk!The Natural History Museum (UK)– picture libraryhttp://piclib.nhm.ac.uk/piclib/www/comp.php?img=87493&frm=med&search=homunculusThis model shows what a man's body would look like if each part grew in proportion to the area of the cortex of the brain concerned with its movement.CSC 9010- Natural Language Processing - Introduction13Controversial questions concerning human language•Language organ•Universal grammar•A single dramatic mutation or gradual adaptation?CSC 9010- Natural Language Processing - Introduction14Why Language is Hard•NLP is AI-complete•Abstract concepts are difficult to represent•LOTS of possible relationships among concepts•Many ways to represent similar concepts•Tens of hundreds or thousands of features/dimensionsCSC 9010- Natural Language Processing - Introduction15Why Language is Easy•Highly redundant•Many relatively crude methods provide fairly good resultsCSC 9010- Natural Language Processing - Introduction16What will it take?•models of computation (state machines)•formal grammars•knowledge representation•search algorithms•dynamic programming•logic•machine learning•probability theoryCSC 9010- Natural Language Processing - Introduction17History of NLP•Prehistory (1940s, 1950s)–automata theory, formal language theory, markov processes (Turing, McCullock&Pitts, Chomsky)–information theory and probabilistic algorithms (Shannon)–Turing test – can machines think?•Early work:–symbolic approach•generative syntax - eg Transformations and Discourse Analysis Project (TDAP- Harris)•AI – pattern matching, logic-based, special-purpose systems–Eliza Rogerian therapist http://www.manifestation.com/neurotoys/eliza.php3–stochastic•baysian methodsearly successes  $$$$ grants!by 1966 US government had spent 20 million on machine translation aloneCritics:–Bar Hillel – “no way to disambiguation without deep understanding”–Pierce NSF 1966 report: “no way to justify work in terms of


View Full Document

Villanova CSC 9010 - Natural Language Processing

Documents in this Course
Lecture 2

Lecture 2

48 pages

Lecture 2

Lecture 2

46 pages

Load more
Download Natural Language Processing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Natural Language Processing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Natural Language Processing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?