DOC PREVIEW
Columbia COMS W4705 - MOVIE REVIEW CLASSIFICATION

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Problem DefinedThe DataThe Classification Tasks1. 4-Star Rating Classifier Task2. Binary (Positive/Negative) Rating Classifier Task3. Reviewer Classifier TaskTesting Machine Learning ToolkitGradingFunctionality (25pts)Results (35pts) Write-up (25pts) Documentation (5pts)Coding Practices (5pts) README File (5pts)SubmissionAcademic IntegrityFAQHOMEWORK 2 MOVIE REVIEW CLASSIFICATION CS 4705: NATURAL LANGUAGE PROCESSING DUE: OCT 29, 2010 BY 11:59PM Table of Contentso you read the reviews before seeing a movie? Have you ever changed your mind about watching a movie because you read bad reviews? Imagine that you read several movie reviews with star ratings given by different reviewers. Now you read an anonymous new review of Avatar in the newspaper that has no stars. Just from the review itself, can you guess who wrote it and how many stars they would give? This is what this homework assignment is about: You will be running a number of machine learning experiments on a set of movie reviews. The tasks involve classifying movie reviews into a 4-star Rating; classifying the same reviews into a binary (Positive/Negative) rating; and identifying the reviewer who wrote the review. You will be turning in a total of five classifiers for this assignment. THE DATA You will be given an annotated data corpus to train your classifiers on. In this corpus there are 5006 reviews, one review per line. Each review/line consists of 4 fields: a review id; a reviewer ID (A,B,C, or D); a star rating(1 to 4/worst to best); and the text of the review itself. This corpus can be found at: /home/cs4705/HW2/movie-corpus.txt. The corpus will look like: <id>1</id><reviewer>A</reviewer><star>3</star><review>It is an interesting comedy…</review> <id>2</id><reviewer>C</reviewer><star>1</star><review>I am so glad that I saw it… </review> <id>3</id><reviewer>B</reviewer><star>2</star><review>I like the story, but…</review> THE CLASSIFICATION TASKS You will need to classify movie reviews in the following separate experiments: 1. 4-STAR RATING CLASSIFIER TASK In this task, you will be given a test set of movie reviews that you have not seen before (not in the training corpus). You should build a classifier to assign ratings to these reviews, using a 4-start rating scheme (1 to 4 / worst to best). There will be two test sets for this classification task: A. In the first test set, the reviews were written by the same four reviewers (A, B, C and D) who wrote the reviews in the given training corpus B. In the second test set, the reviews were written by other reviewers, whose reviews do not appear in the training corpus You must submit two classifiers for this classification task. Note: Even If you decide to use the same classifier for both tasks A and B, you must still submit two classifiers and appropriate documentation for each. 22. BINARY (POSITIVE/NEGATIVE) RATING CLASSIFIER TASK In this task, you are required to build a classifier that will simply classify a movie review as either positive or negative. For training, you should collapse the 3 and 4 star ratings to form the “positive” class and the 1 and 2 star ratings to form the “negative” class. As in the 4-star classification in Task 1, there will be two tests for this binary classification, one on reviewers seen in the training data and one on unseen reviewers. Again, you must submit two classifiers for this task. Note: Again, if you decide to use the same classifier, you must still submit two classifiers, even though they are identical. Hints: There are two strategies that you might adopt: A. You might train classifiers for the Positive Negative rating separately B. You might simply use your 4-star classifiers to classify reviews and then transform the results into a binary classification. Note that this approach may not give you the same results as (A) Whichever strategy you adopt,


View Full Document

Columbia COMS W4705 - MOVIE REVIEW CLASSIFICATION

Download MOVIE REVIEW CLASSIFICATION
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MOVIE REVIEW CLASSIFICATION and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MOVIE REVIEW CLASSIFICATION 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?