Unformatted text preview:

1CS276BText Retrieval and MiningWinter 2005Project Practicum 2Plan for todayGeneral discussion of your proposalsSample project overview (what you have to turn in on Tuesday)More tools you might want to useMore examples of past projectsGeneral feedback on proposalsWe need more specifics on what exactly you’re planning to build.Vagueness was fine for the proposals, but it’s not appropriate for your overview.Avoid discussion of “possible applications” –your overview is a commitment to develop a fleshed-out, polished application.Be ambitious but realistic. It’s okay if at some future point you realize that you don’t have time to implement every feature described in your overview; but your final product should not deviate too far from the scope of your overview. General feedback on proposalsMeasurement criteria are essentialCreating a cool application is great but not sufficient – you also need a predetermined standard for evaluating the success or failure of your work.Some kind of scientific numerical analysis of your system’s performance in comparison to a baseline or rival system:precision/recalluser satisfaction ratingscorrelation or mean squared error (if you’re predicting values)processing time, main memory requirements, disk spaceGeneral feedback on proposalsRemember: a successful project doesn’t have to achieve great performance!Of course it’s better to get good results…But there can be significant value in trying something interesting and finding that it doesn’t work very well.So don’t be afraid to explore an idea that isn’t guaranteed to pan out – as long as there’s reason to believe that it might.Project overview:Suggested structureTitleGroup membersAbstract (one short paragraph)Topic(s) investigatedRelevant prior work (paper citations, actual systems)Delineation of group member responsibilitiesData sourcesTechnologies (programming languages, software, etc.)Existing tools leveragedImplementation detailsSubmission calendar:Block 1Block 2Block 3 (final product)2Sample project overview(idealized – not my actual proposal!)MovieThing: A web-based collaborative filtering movie recommendation systemGroup: Louis Eisenberg (CS coterm) and Joe User (CS senior)Abstract: I will conduct an online experiment by building a website on which registered users can provide ratings for popular movies using a graphical interface. Once I have collected ratings from a substantial number of users, I will generate movie recommendations, assigning each user randomly to one of a handful of distinct recommendation algorithms. I will then solicit feedback from the users on the quality of the recommendations and use that feedback to perform a qualitative analysis of the relative accuracy of the different algorithms.Sample project overviewTopics investigated: collaborative filtering, recommendation systemsRelevant prior work:MovieLens (U. of Minn.)Jester (UC-Berkeley)CF research papers: http://jamesthornton.com/cf/Empirical Analysis of Predictive Algorithms for Collaborative Filtering:http://research.microsoft.com/users/breese/cfalgs.htmlMore research papers…Sample project overviewGroup member responsibilities:Louis: set up database, JDBC and utility code, JavaScript sliders, evaluation codeJoe: AWS code, JSP and servlet front-end code, literature reviewBoth: fill movie table, design CF algorithms, recruit subjects, write final paperData sources:Movie data (title, actors, genres, etc.) from IMDB and AmazonMovie ratings supplied by my usersAmazon product similarity dataTechnologies: servlets/JSP, Javascript, MySQLExisting tools leveraged: Amazon Web ServicesSample project overviewImplementation details:Website will display movies in tabular format with ability to search/filter by title, genre, actors, etc. Users rate movies bydragging sliders.Algorithms: Amazon: use product similarity to generate predicted ratings based on weighted averages using user’s ratings and movies considered “similar” to those the user has rated Standard: predicted ratings are weighted averages using user’s Pearson correlation to other users and the ratings of the other users General deviation: emphasize movies for which user has an unusual opinion by introducing additional term into covariance calculation (which factors into user similarity weight) Personal deviation: emphasize movies about which user feels strongly by cubing covariance terms. Both deviations: combine tweaks of general and personal.Evaluation: Overall ratings of quality of recommendation lists Correlation between predicted and actual ratings for recommended movies that user has already seenSample project overviewSubmission calendar:Block 1:movies table is fully populatedwebsite is live and accepting ratingsBlock 2:sufficient users and ratings have been collectedAmazon similarity data has been retrievedrecommendation algorithms are functionalBlock 3:users have received recommendations and provided feedbackfinal paper includes analysis of algorithms’ relative performanceNotes on sample project overviewYour overview should be more extensive than this sample…More specific implementation details, particularly in regard to algorithmsMore specific goals for each block/milestoneContingency plans for slight modifications to your project if you encounter obstacles?3More toolsMALLETA Machine Learning for Language Toolkithttp://mallet.cs.umass.edu/“an integrated collection of Java code useful for statistical natural language processing, document classification, clustering, information extraction, and other machine learning applications to text”Minimally documented but has lots of stuff:Building feature vectorsVarious classification methods (Naïve Bayes, max-ent, boosting, winnowing)Evaluation: precision, recall, F1, etc.N-gramsSelecting features using information gainThey have some examples of front-end codeMinorThirdhttp://minorthird.sourceforge.net/“a collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text”Documentation seems to be pretty good: comprehensive Javadocs, tutorial, FAQ…Has the concept of “spans” (sequences of words) that can be extracted and classified based on


View Full Document

Stanford CS 276B - Project Practicum 2

Download Project Practicum 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Project Practicum 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Project Practicum 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?