Unformatted text preview:

Computational Linguistics James Pustejovsky Brandeis University Boston Computational Linguistics Olympiad Team Fall 2007 What is Computational Linguistics Computational Linguistics is the computational analysis of natural languages Process information contained in natural language Can machines understand human language Define understand Understanding is the ultimate goal However one doesn t need to fully understand to be useful Goals of this Lecture Learn about the problems and possibilities of natural language analysis What are the major issues What are the major solutions At the end you should Agree that language is subtle and interesting Know about some of the algorithms Know how difficult it can be It s 2007 but we re not anywhere close to realizing the dream or nightmare of 2001 Dave Bowman Open the pod bay doors Dave Bowman Open the pod bay doors please HAL HAL 9000 I m sorry Dave I m afraid I can t do that Why is NLP difficult Computers are not brains There is evidence that much of language understanding is built in to the human brain Computers do not socialize Much of language is about communicating with people Key problems Representation of meaning Language presupposed knowledge about the world Language only reflects the surface of meaning Language presupposes communication between people Hidden Structure English plural pronunciation Toy s toyz Book s books Church s churchiz Box s boxiz Sheep s sheep add add add add add z s iz iz nothing What about new words Bach s boxs why not boxiz Language subtleties Adjective order and placement A A A A A big black dog big black scary dog big scary dog scary big dog black big dog Antonyms Which sizes go together Big and little Big and small Large and small Large and little World Knowledge is subtle He arrived at the lecture He chuckled at the lecture He arrived drunk He chuckled drunk He chuckled his way through the lecture He arrived his way through the lecture Words are ambiguous have multiple meanings I know that I know that block I know that blocks the sun I know that block blocks the sun Headline Ambiguity Iraqi Head Seeks Arms Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Kids Make Nutritious Snacks British Left Waffles on Falkland Islands Red Tape Holds Up New Bridges Bush Wins on Budget but More Lies Ahead Hospitals are Sued by 7 Foot Doctors Ban on nude dancing on Governor s desk Local high school dropouts cut in half The Role of Memorization Children learn words quickly As many as 9 words day Often only need one exposure to associate meaning with word Can make mistakes e g overgeneralization I goed to the store Exactly how they do this is still under study The Role of Memorization Dogs can do word association too Rico a border collie in Germany Knows the names of each of 100 toys Can retrieve items called out to him with over 90 accuracy Can also learn and remember the names of unfamiliar toys after just one encounter putting him on a par with a three year old child http www nature com news 2004 040607 pf 040607 8 pf html But there is too much to memorize establish establishment the church of England as the official state church disestablishment antidisestablishment antidisestablishmentarian antidisestablishmentarianism is a political philosophy that is opposed to the separation of church and state Rules and Memorization Current thinking in psycholinguistics is that we use a combination of rules and memorization However this is very controversial Mechanism If there is an applicable rule apply it However if there is a memorized version that takes precedence Important for irregular words Artists paint still lifes Not still lives Past tense of think thought blink blinked This is a simplification for more on this see Pinker s Words and Language and The Language Instinct Representation of Meaning I know that block blocks the sun How do we represent the meanings of block How do we represent I know How does that differ from I know that Who is I How do we indicate that we are talking about earth s sun vs some other planet s sun When did this take place What if I move the block What if I move my viewpoint How do we represent this How to tackle these problems The field was stuck for quite some time A new approach started around 1990 Well not really new but the first time around in the 50 s they didn t have the text disk space or GHz Main idea combine memorizing and rules How to do it Get large text collections corpora Compute statistics over the words in those collections Surprisingly effective Even better now with the Web Corpus based Example Pre Nominal Adjective Ordering Important for translation and generation Examples big fat Greek wedding fat Greek big wedding Some approaches try to characterize this as semantic rules e g Age color value dimension Data intensive approaches Assume adjective ordering is independent of the noun they modify Compare how often you see a b vs b a Keller Lapata The Web as Baseline HLT NAACL 04 Corpus based Example Pre Nominal Adjective Ordering Data intensive approaches Compare how often you see a b vs b a What happens when you encounter an unseen pair Shaw and Hatzivassiloglou 99 use transitive closures Malouf 00 uses a back off bigram model P a b a b vs P b a a b He also uses morphological analysis semantic similarity calculations and positional probabilities Keller and Lapata 04 use just the very simple algorithm But they use the web as their training set Gets 90 accuracy on 1000 sequences As good as or better than the complex algorithms Keller Lapata The Web as Baseline HLT NAACL 04 Real World Applications of NLP Spelling Suggestions Corrections Grammar Checking Synonym Generation Information Extraction Text Categorization Automated Customer Service Speech Recognition limited Machine Translation In the near future Question Answering Improving Web Search Engine results Automated Metadata Assignment Online Dialogs Synonym Generation Synonym Generation Synonym Generation Levels of Language Sound Structure Phonetics and Phonology The sounds of speech and their production The systematic way that sounds are differently realized in different environments Word Structure Morphology From morphos shape not transform as in morph Analyzes how words are formed from minimal units of meaning also derivational rules dog s dogs eat eats ate Phrase Structure Syntax From the Greek syntaxis arrange together Describes grammatical arrangements of words into hierarchical structure Levels of Language


View Full Document

Brandeis CS 101A - Computational Linguistics

Loading Unlocking...
Login

Join to view Computational Linguistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computational Linguistics and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?