CMSC 723 LING 723 Computational Linguistics I September 3 2008 Bonnie Dorr Overview History Goals Problems J M 1 CL vs NLP Why Computational Linguistics CL rather than Natural Language Processing NLP Computational Linguistics Computers dealing with language modeling what people do Natural Language Applications on the computer side Why natural Refers to the language spoken by people e g English Japanese Swahili as opposed to artificial languages like C Java etc Relation of CL to Other Disciplines Artificial Intelligence AI notions of rep search etc Machine Learning particularly probabilistic or statistic ML techniques Human Computer Interaction HCI Electrical Engineering EE Optical Character Recognition Linguistics Syntax Semantics etc CL Psychology Philosophy of Language Formal Logic Theory of Computation Information Retrieval Where does it fit in the CS taxonomy Computers SWE HCI Databases Robotics ML Artificial Intelligence Alg Thy NA Logic Natural Language Processing Information Retrieval Machine Translation Search Language Analysis Semantics Adapted from Rada Mihalcea 2007 Sys networks Parsing A Sampling of Other Disciplines Linguistics formal grammars abstract characterization of what is to be learned Computer Science algorithms for efficient learning or online deployment of these systems in automata Engineering stochastic techniques for characterizing regular patterns for learning and ambiguity resolution Psychology Insights into what linguistic constructions are easy or difficult for people to learn or to use History 1940 1950 s Development of formal language theory Chomsky Kleene Backus Formal characterization of classes of grammar context free regular Association with relevant automata Probability theory language understanding as decoding through noisy channel Shannon Use of information theoretic concepts like entropy to measure success of language models 1957 1983 Symbolic vs Stochastic Symbolic Use of formal grammars as basis for natural language processing and learning systems Chomsky Harris Use of logic and logic based programming for characterizing syntactic or semantic inference Kaplan Kay Pereira First toy natural language understanding and generation systems Woods Minsky Schank Winograd Colmerauer Discourse Processing Role of Intention Focus Grosz Sidner Hobbs Stochastic Modeling Probabilistic methods for early speech recognition OCR Bledsoe and Browning Jelinek Black Mercer 1983 1993 Return of Empiricism Use of stochastic techniques for part of speech tagging parsing word sense disambiguation etc Comparison of stochastic symbolic more or less powerful models for language understanding and learning tasks 1993 1999 Advances in software and hardware create NLP needs for information retrieval web machine translation spelling and grammar checking speech recognition and synthesis Stochastic and symbolic methods combine for real world applications The Rise of Machine Learning 2000 2007 Large amounts of spoken written material now widely available LDC etc Increased focus on learning has led to more serious interplay with statistical ML community Unsupervised learning techniques on the rise in part brought about by difficulty of producing reliably annotated corpora Language and Intelligence Turing Test Turing test machine human and human judge Judge asks questions of computer and human Machine s job is to act like a human human s job is to convince judge that he s not the machine Machine judged intelligent if it can fool judge Judgement of intelligence linked to appropriate answers to questions from the system ELIZA Remarkably simple Rogerian Psychologist Uses Pattern Matching to carry on limited form of conversation Seems to Pass the Turing Test McCorduck 1979 pp 225 226 Eliza Demo http www lpa co uk pws dem4 htm What s involved in an intelligent Answer Analysis Decomposition of the signal spoken or written eventually into meaningful units This involves Speech Character Recognition Decomposition into words segmentation of words into appropriate phones or letters Requires knowledge of phonological patterns I m enormously proud I mean to make you proud Morphological Analysis Inflectional duck s N duck plural s duck s V duck 3rd person s Derivational kind kindness Spelling changes drop dropping hide hiding Syntactic Analysis Associate constituent structure with string Prepare for semantic interpretation S OR NP I VP V watched watch Subject NP det I Object terrapin N Det the terrapin the Semantics A way of representing meaning Abstracts away from syntactic structure Example First Order Logic watch I terrapin Can be I watched the terrapin or The terrapin was watched by me Real language is complex Who did I watch Lexical Semantics The Terrapin is who I watched Watch the Terrapin is what I do best Terrapin is what I watched the Predicate watch Watcher I Watchee Terrapin Compositional Semantics Association of parts of a proposition with semantic roles Proposition Experiencer I 1st pers sg Predicate Be perc pred saw patient the Scoping Every man loves a woman Terrapin Word Governed Semantics Any verb can add able to form an adjective I taught the class The class is teachable I rejected the idea The idea is rejectable Association of particular words with specific semantic forms John masculine The boys masculine plural human Pragmatics Real world knowledge speaker intention goal of utterance Related to sociology Example 1 Could you turn in your assignments now command Could you finish the homework question command Example 2 I couldn t decide how to catch the crook Then I decided to spy on the crook with binoculars To my surprise I found out he had them too Then I knew to just follow the crook with binoculars the crook with binoculars the crook with binoculars Discourse Analysis Discourse How propositions fit together in a conversation multi sentence processing Pronoun reference The professor told the student to finish the assignment He was pretty aggravated at how long it was taking to pass it in Multiple reference to same entity George W Bush president of the U S Relation between sentences John hit the man He had stolen his bicycle NLP Pipeline speech text Phonetic Analysis OCR Tokenization Morphological analysis Syntactic analysis Semantic Interpretation Discourse Processing Relation to Machine Translation analysis input generation output Morphological analysis Morphological synthesis Syntactic analysis Syntactic realization Semantic Interpretation Lexical selection
View Full Document
Unlocking...