Natural Language Processing Lecture 1 8 27 2013 CSCI 5832 Natural Language Processing We re going to study what goes into getting computers to perform useful and interesting tasks involving human language 9 17 13 Speech and Language Processing Jurafsky and Martin 2 1 Natural Language Processing More specifically it s about the algorithms that we use process language the formal basis for those algorithms and the facts about human language that allow those algorithms to work Speech and Language Processing Jurafsky and Martin 9 17 13 3 Why Should You Care Three trends 1 An enormous amount of information is now available in machine readable form as natural language text newspapers web pages medical records financial filings etc 2 Conversational agents are becoming an important form of human computer communication 3 Much of human human interaction is now mediated by computers via social media 9 17 13 Speech and Language Processing Jurafsky and Martin 4 2 Applications Let s take a quick look at three important application areas w Text analytics w Question answering w Machine translation 9 17 13 Speech and Language Processing Jurafsky and Martin 5 Text Analytics Data mining of weblogs microblogs discussion forums message boards user groups and other forms of user generated media w Product marketing information w Political opinion tracking w Social network analysis w Buzz analysis what s hot what topics are people talking about right now 9 17 13 Speech and Language Processing Jurafsky and Martin 6 3 Text Analytics 9 17 13 Speech and Language Processing Jurafsky and Martin 7 Text Analytics 9 17 13 Speech and Language Processing Jurafsky and Martin 8 4 Question Answering Traditional information retrieval provides documents resources that provide users with what they need to satisfy their information needs Question answering on the other hand directly provides an answer to information needs posed as questions 9 17 13 Speech and Language Processing Jurafsky and Martin 9 Web Q A 9 17 13 Speech and Language Processing Jurafsky and Martin 10 5 Watson 9 17 13 Speech and Language Processing Jurafsky and Martin 11 Machine Translation The automatic translation of texts between languages is one of the oldest non numerical applications in Computer Science In the past 10 years or so MT has gone from a niche academic curiosity to a robust commercial industry 9 17 13 Speech and Language Processing Jurafsky and Martin 12 6 Google Translate 9 17 13 Speech and Language Processing Jurafsky and Martin 13 Google Translate 9 17 13 Speech and Language Processing Jurafsky and Martin 14 7 How All of these applications operate by exploiting underlying regularities inherent in human languages Sometimes in complex ways sometimes in pretty trivial ways Language structure Formal models Practical applications Speech and Language Processing Jurafsky and Martin 9 17 13 15 Major Class Topics 1 2 3 4 9 17 13 Words Syntax Meaning Texts 5 Applications exploiting each Speech and Language Processing Jurafsky and Martin 16 8 Applications First what makes an application a language processing application as opposed to any other piece of software w An application that requires the use of knowledge about the structure of human language Example Is Unix wc word count an example of a language processing application 9 17 13 Speech and Language Processing Jurafsky and Martin 17 Applications Word count w When it counts words Yes To count words you need to know what a word is That s knowledge of language Note that the definition of word embodied in wc doesn t work for Chinese or other languages that don t delimit words with spaces w When it counts lines and bytes No Lines and bytes are computer artifacts not linguistic entities 9 17 13 Speech and Language Processing Jurafsky and Martin 18 9 Administrative Stuff Waitlist Web page w www cs colorado edu martin csci5832 Reasonable preparation Requirements 9 17 13 Speech and Language Processing Jurafsky and Martin 19 CAETE For remote students w Don t fall behind on the lectures We re covering a lot of material in a short period of time w You have a standing 1 week delay on assignment deadlines and on the midterm dates All students can access the class lectures via w cuengineeringonline colorado edu 9 17 13 Speech and Language Processing Jurafsky and Martin 20 10 Web Page The course web page can be found at www cs colorado edu martin csci5832 It will have the syllabus lecture notes assignments announcements etc You should check the News tab periodically for new stuff I ll be using this in preference to email 9 17 13 Speech and Language Processing Jurafsky and Martin 21 Mailing List There is a automatically generated mailing list Mail goes to your colorado edu email address w I can t alter it so don t ask me to send your mail to gmail yahoo work or whatever w You can set up a forward yourself 9 17 13 Speech and Language Processing Jurafsky and Martin 22 11 Preparation Ability to program Familiarity with linguistics psychology Basic algorithm and data structure and philosophy analysis Ability to write well in Some exposure to English logic Exposure to basic concepts in probability Speech and Language Processing Jurafsky and Martin 9 17 13 23 Requirements Readings w Speech and Language Processing by Jurafsky and Martin 2ed Prentice Hall 2009 w A few conference or journal papers Around 5 assignments w Mainly programming and written problem sets 2 midterms Final comprehensive sort of exam on Tuesday December 17 from 4 30 to 7 00 w Don t leave Boulder before the final 9 17 13 Speech and Language Processing Jurafsky and Martin 24 12 Programming Most of the programming will be done in Python w It s free and works on Windows Macs and Linux w It s easy to install w Easy to learn 9 17 13 Speech and Language Processing Jurafsky and Martin 25 Programming Go to www python org to get started The default installation comes with an editor called IDLE It s a serviceable development environment Python mode in Emacs is pretty good It s what I use but I m a dinosaur If you like Eclipse use that 9 17 13 Speech and Language Processing Jurafsky and Martin 26 13 Grading 9 17 13 Assignments 30 Midterms 30 Final 30 Participation 10 Speech and Language Processing Jurafsky and Martin 27 Caveat NLP has an distinct AI aspect to it w We re often dealing with ill defined problems w We don t often come up with exact solutions algorithms That is we re dealing with algorithms that don t work w To make progress we need to have
View Full Document
Unlocking...