DOC PREVIEW
Stanford CS 224 - Picking the Fresh from the Rotten

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS224N/Ling237 Final Project Eric Yeh, [email protected]/Ling237 Final ProjectPicking the Fresh from the Rotten: Quote and Sentiment Extraction fromRotten Tomatoes Movie Reviews.Eric Yeh1. AbstractBeing able to quickly determine whether a movie is worth watching is one of the major appeals of themovie review website Rotten Tomatoes (www.rottentomatoes.com). The site gathers reviews from knownmovie critics, with each critic's review assigned a binary rating expressing the review’s sentiment, fresh(positive) or rotten (negative), and a quote, the single sentence/phrase from the review that best exemplifiesthat rating. This form capsule summarization allows users the ability to discern the consensus of a group ofcritics at a glance. A system was constructed to perform the "Rotten Tomatoes Task" of identifying thequote and rating from an unlabeled review. A maximum entropy classifier was trained on a corpus ofreviews from Rotten Tomatoes that had their ratings and accompanying quotes identified. Lexical andpositional features were used. In addition, the notion of regional coherence, from text-segmentation work,was applied to create features that would help identify potential quotes from a review. A discussion of thenature of this "Rotten Tomatoes Task" is given, as well as future avenues of investigation, includingalternative evaluation methods.2. OverviewThe ability to be able to quickly determine the overall sentiment of a document is a topic that has beengarnering much interest in recent years, especially with the increase in the number of reviews online. Theaverage user does not have the time to read through each and every product review on the web, to arrive atan informed decision about whether or not to make a purchase. It is this ability to quickly survey theconsensus of a group of trusted critics1 that makes the movie review website Rotten Tomatoes(www.rottentomatoes.com) popular.For a given movie listed on Rotten Tomatoes, its review page consists of what essentially are very smallcapsule summaries of a critic’s review of that movie. Each of these summaries consists of a rating and aquote. The rating is binary: it is either fresh (positive), or rotten (negative). The reason for this rating strictscheme is simple: ultimately the user needs to make a “go” or “no-go” decision on whether or not to spend$10+ to see the movie (i.e. one does not pay $6 out of a $10 admission for a movie rated 3/5 stars). Thequote is the sentence or phrase, chosen by the Rotten Tomatoes editorial staff, that is considered to be themost representative of rating assigned by the review (Figure 1)."Fails on so many levels that it is difficult to even know where to start.""A flawed masterpiece.""You'll be rooting for these people to get slaughtered out of sheer boredom."Figure 1: Example quotes from Rotten Tomatoes.Currently, this “Rotten Tomatoes Task” of reading, classifying, and extracting quotes is performed entirelyby the human editorial staff. Obviously this is a labor-intensive task, and any form of usable automationcan be a boon for consumers looking to scan product reviews, as well as the editors that have to do this sortof thing on a weekly basis.Movie reviews have certain characteristics that make them unique from other types of reviews. Word typesthat would normally be associated with sentiment, i.e. "good," "bad," "sucks," are oftentimes used in 1 For the criteria, please see http://www.rottentomatoes.com/pages/critics.CS224N/Ling237 Final Project Eric Yeh, [email protected] sentences that describe plot elements, which are not indicative of the reviewer's rating. Worseyet: certain reviewers tend to throw in other seemingly opinionated elements that have no immediate andobvious bearing on how they felt about a movie; examples include descriptions of their moody journey tothe film festival on a bus, or how an eating disorder is destroying his/her personal life. Thus, a review canbe broken down into subjective sentences that are expressive of the reviewer's sentiment about the movie,and objective sentences that do not have any direct or obvious bearing on or support of that sentiment. TheRotten Tomatoes assigned quote can be thought of as the most subjective sentence of the review, as it isintended to capture the reviewer's sentiment.Note that the notion of subjectivity in this case does not have any implication of the polarity of thatsentiment: a sentence is considered subjective as long as it is indicative, and could be representative of thereviewer's opinion. Following, an objective sentence can contain adjectives and other word descriptors thatat first glance can be considered expressive of an opinion, but ultimately has no bearing on the review'ssentiment. For example, the following statement from a review of Magadascar would seem to be positivein nature, “The animation is smooth and stylish, with blocky, exaggerated shaping that brings to mind a 3-D version of "Batman: The Animated Series.” However, it served to describe the movie’s animation styleand production, and ultimately was not reflective of the negative review assigned to it. As (Pang, et al.,2002) have shown, subjective and objective language does not easily fit sets of adjectives and word typesthat one would normally associate with them. Combined with the fact many reviews are written in a stylethat makes it just plain hard to figure out if the reviewer liked the film or not (i.e. movie reviews from “TheVillage Voice”), this makes movie reviews a challenging domain to work in.Previous work that approached the "Rotten Tomatoes Task" split it into two problems: determining the“fresh”/”rotten” polarity of that review, and choosing a quote that best expresses that review. Reviewrating classification has generally been good, with accuracies around 80-90%, despite the difficulties withthe inherent nature of movie reviews (Pang and Lee, 2004).Determining which sentence from a review should be used as the quote (or the basis of the quote) hasturned out to be a trickier problem. Studies that addressed quote extraction have treated it as a sentence-level classification problem (Beineke et al., 2004, Fingal et al., 2004). However, as those studies havenoted, and for anyone who actually has had to do this task manually, it quickly becomes apparent that mostreviews contain multiple candidate sentences that each can serve as equally well as the Rotten


View Full Document

Stanford CS 224 - Picking the Fresh from the Rotten

Documents in this Course
Load more
Download Picking the Fresh from the Rotten
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Picking the Fresh from the Rotten and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Picking the Fresh from the Rotten 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?