Unformatted text preview:

Natural Language Processing Lecture 1 8 27 2013 CSCI 5832 Natural Language Processing We re going to study what goes into getting computers to perform useful and interesting tasks involving human language 01 14 19 Speech and Language Processing Jurafsky and Martin 2 Natural Language Processing More specifically it s about the algorithms that we use process language the formal basis for those algorithms and the facts about human language that allow those algorithms to work 01 14 19 Speech and Language Processing Jurafsky and Martin 3 Why Should You Care Three trends 1 An enormous amount of information is now available in machine readable form as natural language text newspapers web pages medical records financial filings etc 2 Conversational agents are becoming an important form of human computer communication 3 Much of human human interaction is now mediated by computers via social media 01 14 19 Speech and Language Processing Jurafsky and Martin 4 Applications Let s take a quick look at three important application areas Text analytics Question answering Machine translation 01 14 19 Speech and Language Processing Jurafsky and Martin 5 Text Analytics Data mining of weblogs microblogs discussion forums message boards user groups and other forms of user generated media Product marketing information Political opinion tracking Social network analysis Buzz analysis what s hot what topics are people talking about right now 01 14 19 Speech and Language Processing Jurafsky and Martin 6 Text Analytics 01 14 19 Speech and Language Processing Jurafsky and Martin 7 Text Analytics 01 14 19 Speech and Language Processing Jurafsky and Martin 8 Question Answering Traditional information retrieval provides documents resources that provide users with what they need to satisfy their information needs Question answering on the other hand directly provides an answer to information needs posed as questions 01 14 19 Speech and Language Processing Jurafsky and Martin 9 Web Q A 01 14 19 Speech and Language Processing Jurafsky and Martin 10 Watson 01 14 19 Speech and Language Processing Jurafsky and Martin 11 Machine Translation The automatic translation of texts between languages is one of the oldest non numerical applications in Computer Science In the past 10 years or so MT has gone from a niche academic curiosity to a robust commercial industry 01 14 19 Speech and Language Processing Jurafsky and Martin 12 Google Translate 01 14 19 Speech and Language Processing Jurafsky and Martin 13 Google Translate 01 14 19 Speech and Language Processing Jurafsky and Martin 14 How All of these applications operate by exploiting underlying regularities inherent in human languages Sometimes in complex ways sometimes in pretty trivial ways Language Formal Practical structure 01 14 19 models Speech and Language Processing Jurafsky and Martin applications 15 Major Class Topics 1 Words 2 Syntax 3 Meaning 5 Applications exploiting eac 4 Texts 01 14 19 Speech and Language Processing Jurafsky and Martin 16 Applications First what makes an application a language processing application as opposed to any other piece of software An application that requires the use of knowledge about the structure of human language Example Is Unix wc word count an example of a language processing application 01 14 19 Speech and Language Processing Jurafsky and Martin 17 Applications Word count When it counts words Yes To count words you need to know what a word is That s knowledge of language Note that the definition of word embodied in wc doesn t work for Chinese or other languages that don t delimit words with spaces When it counts lines and bytes No Lines and bytes are computer artifacts not linguistic entities 01 14 19 Speech and Language Processing Jurafsky and Martin 18 Administrative Stuff Waitlist Web page www cs colorado edu martin csci5832 Reasonable preparation Requirements 01 14 19 Speech and Language Processing Jurafsky and Martin 19 CAETE For remote students Don t fall behind on the lectures We re covering a lot of material in a short period of time You have a standing 1 week delay on assignment deadlines and on the midterm dates All students can access the class lectures via cuengineeringonline colorado edu 01 14 19 Speech and Language Processing Jurafsky and Martin 20 Web Page The course web page can be found at www cs colorado edu martin csci5832 It will have the syllabus lecture notes assignments announcements etc You should check the News tab periodically for new stuff I ll be using this in preference to email 01 14 19 Speech and Language Processing Jurafsky and Martin 21 Mailing List There is a automatically generated mailing list Mail goes to your colorado edu email address I can t alter it so don t ask me to send your mail to gmail yahoo work or whatever You can set up a forward yourself 01 14 19 Speech and Language Processing Jurafsky and Martin 22 Preparation Ability to Familiarity with program linguistics Basic algorithm psychology and and data philosophy structure Ability to write analysis well in English Some exposure to logic Exposure to basic concepts in probability 01 14 19 Speech and Language Processing Jurafsky and Martin 23 Requirements Readings Speech and Language Processing by Jurafsky and Martin 2ed Prentice Hall 2009 A few conference or journal papers Around 5 assignments Mainly programming and written problem sets 2 midterms Final comprehensive sort of exam on Tuesday December 17 from 4 30 to 7 00 Don t leave Boulder before the final 01 14 19 Speech and Language Processing Jurafsky and Martin 24 Programming Most of the programming will be done in Python It s free and works on Windows Macs and Linux It s easy to install Easy to learn 01 14 19 Speech and Language Processing Jurafsky and Martin 25 Programming Go to www python org to get started The default installation comes with an editor called IDLE It s a serviceable development environment Python mode in Emacs is pretty good It s what I use but I m a dinosaur If you like Eclipse use that 01 14 19 Speech and Language Processing Jurafsky and Martin 26 Grading 01 14 19 Assignments 30 Midterms 30 Final 30 Participation 10 Speech and Language Processing Jurafsky and Martin 27 Caveat NLP has an distinct AI aspect to it We re often dealing with ill defined problems We don t often come up with exact solutions algorithms That is we re dealing with algorithms that don t work To make progress we need to have concrete metrics that tell us how well we re doing or at


View Full Document

CU-Boulder CSCI 5832 - Lecture 1

Loading Unlocking...
Login

Join to view Lecture 1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 1 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?