Stanford CS 224N - Natural Language Processing Lecture

Unformatted text preview:

CS224n is in this interstice cartoon from xkcd com Natural Language Processing CS224N Ling284 Christopher Manning Lecture 1 Course logistics in brief Instructor Christopher Manning TAs Spence Green head TA Bharath Bhat Milind Ganjoo and Sebastian Schuster Time MW 11 00 12 15 Skilling Aud The work is mainly big programming assignments Programming language mainly or all Java Other information see the class webpage http cs224n stanford edu or right now http www stanford edu class cs224n Handouts online This class Assumes you come with some skills Some basic linear algebra probability and statistics decent programming skills But not everyone has the same skills Assumes some ability to learn missing knowledge Teaches key theory and methods for statistical NLP MT information extraction parsing semantics etc Learn techniques which can be used in practical robust systems that can partly understand human language But it s something like an AI Systems class A lot of it is hands on problem based learning Often practical issues are as important as theoretical niceties We often combine a bunch of ideas Goals of the field of NLP Computers would be a lot more useful if they could handle our email do our library research chat to us But they are fazed by natural human languages Or at least their programmers are most people just avoid the problem and get into menus and radio buttons or XML or the so called semantic web or But someone has to work on the hard problems How can we tell computers about language Or help them to learn it as kids do Natural language the earliest UI Dave Bowman Open the pod bay doors HAL HAL I m sorry Dave I m afraid I can t do that cf also false Maria in Metropolis 1926 What where is NLP Goals can be very far reaching True text understanding and interpretation Real time participation in spoken dialogs High quality machine translation Or very down to earth Finding the price of products on the web Analyzing reading level or authorship statistically Sentiment detection about products or stocks Extracting names facts or relations from documents These days the latter predominate As NLP becomes increasingly possible it becomes increasingly engineering oriented Also related to changes in approach in AI NLP in general Commercial world The hidden structure of language We re going beneath the surface Not just string processing Not just keyword matching in a search engine This is the move that Google has been increasingly engaged in in recent years Moving from matching keywords to satisfying user needs Not just converting a sound stream to a string of words Like Nuance Google speech recognition We want to recover and manipulate at least some aspects of language structure and meaning Is the problem just cycles Bill Gates Remarks to Gartner Symposium October 6 1997 Applications always become more demanding Until the computer can speak to you in perfect English and understand everything you say to it and learn in the same way that an assistant would learn until it has the power to do that we need all the cycles We need to be optimized to do the best we can Right now linguistics are right on the edge of what the processor can do As we get another factor of two then speech will start to be on the edge of what it can do Why NLP is difficult Newspaper headlines 1 2 3 4 5 6 7 8 9 Minister Accused Of Having 8 Wives In Jail Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Miners refuse to work after death Local High School Dropouts Cut in Half Red Tape Holds Up New Bridges Clinton Wins on Budget but More Lies Ahead Hospitals Are Sued by 7 Foot Doctors Police Crack Found in Man s Buttocks Why is natural language understanding difficult Fed raises interest rates 0 5 in effort to control inflation NYT headline from better economic times 17 May 2000 Language still the ultimate UI Where is A Bug s Life playing in Mountain View A Bug s Life is playing at the Century 16 Theater When is it playing there It s playing at 2pm 5pm and 8pm OK I d like 1 adult and 2 children for the first show How much would that cost But we need domain knowledge discourse knowledge world knowledge linguistic knowledge Why is natural language computing hard Natural language is highly ambiguous at all levels complex and subtle use of context to convey meaning fuzzy probabilistic involves reasoning about the world a key part of people interacting with other people a social system persuading insulting and amusing them But NLP can also be surprisingly easy sometimes rough text features can often do half the job Making progress on this problem The task is difficult What tools do we need Knowledge about language Knowledge about the world A way to combine knowledge sources The answer that s been getting traction probabilistic models built from language data P maison house high P L avocat g n ral the general avocado low Some computer scientists think this is a new A I or machine learning idea But really it s an older idea that was taken from the electrical engineers Where do we head Look at subproblems approaches and applications at different levels Statistical machine translation Statistical NLP classification and sequence models part of speech tagging named entity recognition information extraction Syntactic probabilistic parsing Building semantic representations from text QA Unfortunately left out natural language generation phonology morphology speech dialogue systems more on natural language understanding There are other classes for some cs224u s Machine Translation The U S island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological chemical attack against public places such as the airport The classic acid test for natural language processing Requires capabilities in both interpretation and generation About 26 billion spent annually on human translation Scott Klemmer I learned a surprising fact at our research group lunch today Google Sketchup releases a version every 18 months and the primary difficulty of releasing more often is not the difficulty of producing software but the cost of internationalizing the user manuals Many slides from Kevin Knight at ISI Statistical Solution Parallel Texts Rosetta Stone Hieroglyphs Demotic Greek Statistical Solution Parallel Texts Instruction Manuals Hong Kong Macao Legislation Canadian Parliament Hansards United Nations


View Full Document
Download Natural Language Processing Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Natural Language Processing Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Natural Language Processing Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?