Unformatted text preview:

CS 585: Natural Language ProcessingFall 2004Final ProjectOne-page proposal due: November 4Progress report due: November 18Source code due: December 2Final report and class presentation due: December 7Project guidelines based on those of Chris Manning1 IntroductionThe final project is an opportunity for you to work on a larger NLP systemon a topic of your choice.The projects will be judged on clarity in defining the problem to beinvestigated, the methods used, thoroughness in considering and justifyingyour design decisions, and quality of your write-up, including your testing ofthe system and reporting of results. You will not be penalized if your systemperforms poorly, providing your initial design decisions weren’t obviouslyunjustifiable, and you have made reasonable attempts to analyze why itfailed, and to examine how the system might be improved.The final project can be a group project. Indeed, by working as agroup, you can attempt something larger and more interesting. However,the amount of work should be appropriately scaled to the size of the group,and you should include a brief statement on the responsibilities of differentmembers of the team. Team members will normally get the same grade, butI reserve the right to differentiate in egregious cases. In general we wouldlike group sizes of 2; if you are considering a bigger group, you need to talkto me. Solo projects are, of course, allowed.You are free (and, where appropriate, encouraged) to make use of existingcode and syste ms as part of your project, but you should make sure theiruse is properly acknowledged, and make clear what additional value yourproject is adding.1The first deadline (November 4th, midnight) is to submit a project pro-posal. This will be graded for thoughtfulness, and its strength in addressingthe following six questions:1. What is the problem or task that you prop os e to solve?2. What is interesting about this problem from an NLP p erspective?3. What technical method or approach will you use?4. On what data will you run your system?5. How will you evaluate the performance of your system?6. What NLP-related difficulties and challenges do you anticipate?This is to encourage you to get organized, and also a chance for furtherdialog between you and the instructor. I can give you extra references, andalso information on whether we think the scope of the project is too smallor two big.2 DataA quite large amount of natural language data of various sorts is availableat UMass. This includes collections from major publishers such as the Lin-guistic Data Consortium (http://www.ldc.upenn.edu/), and some smallercollections, such as text categorization and information extraction trainingand test sets. The biggest amount of this data is in English, but there isalso some in major foreign languages.In particular, you might want to consider the following data sources.The last three bullets are lists of additional sources, and may be especiallyinteresting.• Penn Treebank (parse trees and POS)• Brown Corpus (parse trees and POS)• NetTalk (text to speech)• ACE (named entity IE from newswire, and relations)• CoNLL (named entity IE from newswire)• Stanford pointers: http://nlp.stanford.edu/links/statnlp.html2• CMU: http://www.cs.cmu.edu/ TextLearning• ISI IE archive: http://www.isi.edu/ muslea/RISESize and GradingA size recommendation is difficult to define, but roughly you shouldbe aiming for each member of the team to do as much work as on twoof the homeworks. You should aim to do something that is interesting,not just an exercise in programming. This may only be an extension of aprevious homework assignemnt, and implementation of an existing method(with some of your own extensions, would be nice). There should be a clearfocus in terms of what you hope to achieve, or hope to show.You will be graded on the deliverables for all four due dates above:proposal, progress repo rt, source code, oral presentation and rep ort, withmore emphasize on the later deliverables. In the first two deliverables, I’mlooking for evidence of c lear thinking, and good facility with the conceptswe’ve learned in class, in your answers to the six questions named above.Your project write-up should be adequate, but doesn’t need to scalelinearly in size. One person might want to write 5-6 pages. A three personproject may well find that a 10 page write-up is quite sufficient. Think ofthe write-up as something like a s mall conference paper, focussed on NLPresearch questions and achievements, though you may want to include a bitmore detail on methods used, examples, etc. The quality of your write-upis important.It’s hard to define exactly what the write-up should cover, because itdepends on the project, but generally, I’m looking again for answers to thesix questions ab ove, also including• a technical description of the method you used,• discussion on the linguistic assumptions of the model and their validity• a clear presentation of the data, experimental setup, and the experi-mental results,• your analysis of those results,• discussion of alternatives or things you tried to improve performance,and how they fared.I’m happy to help give you direction in e ach of these aspects, so pleaseactively communicate with me about your progress.3You should make all submissions by email to Gary and myself.In your in-class presentation (both for the proposal and the final project)you should help m ake your points using slide transparencies—either theplastic or Powerpoint variety.3 Project IdeasSome of you are still having difficulty settling on a project topic. You areencouraged to think up your own, but I am realizing that some of you mayneed more concrete suggestions.A good source of ideas could be recent NLP conferences. You can findmany NLP conference papers available online at http://acl.ldc.upenn.edu.• Write a system for some task in natural language clustering, such asfinding related web pages, or learning part of speech categories fromraw data, for which you might look at (Schutze 1993, Schutze 1995).Or, one could attempt to use clusters to improve the quality of alanguage model, or predicting what objects a verb takes.• An information extraction system. This could aim to extract namedentities (such as person names, organizations, etc) from a certain typeof text (newspaper reports, biology articles, etc). This may involveextentions to your HMM homework. You could try to extract infor-mation about seminar


View Full Document

UMass Amherst CS 585 - Natural Language Processing

Download Natural Language Processing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Natural Language Processing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Natural Language Processing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?