Stanford CS 224N - Sentiment Categorization of Restaurant Reviews - D2343625

Home> Schools> Stanford University> Computer Science (CS) > CS 224N> Sentiment Categorization of Restaurant Reviews

Stanford CS 224N - Sentiment Categorization of Restaurant Reviews

Course Cs 224n- Natural Language Processing

Pages 10

Download Save

Unformatted text preview:

Star Quality: Sentiment Categorization of Restaurant ReviewsAmir Ghazvinian [email protected] We consider the problem of classifying reviews by overall sentiment. Rather than predict a simple positive or negative evaluation on the part of the review’s author, we employ methods to determine the author’s numerical rating on a multi-point scale ranging from one to five stars. We evaluate our approach using restaurant review data from OpenTable.com, a service for finding restaurants and making reservations online. We find that by using maximum entropy classification with a carefully selected feature set and a sentiment model, we can achieve greater than 60% precision in predicting the ratings for these review. Surprisingly, this result is comparable to the precision of humans performing the same task. 1 Introduction Given the immense amount of subjective content such as movie, restaurant, and product reviews on the web today, we can derive a great deal of benefit by being able to interpret the opinions and sentiment put forth in these reviews. For example, sentiment analysis can be used to provide succinct review summaries to readers or to generate automatic recommendations for customers. Additionally, more nuanced analyses can lead to the identification of reasons behind reviewer’s opinions and can also be utilized in question answering tasks related to opinion. Here we focus on the task of sentiment categorization, which takes a segment of unlabeled text and attempts to classify the text according to overall sentiment. In this project, we apply natural language processing techniques to classify a set of restaurant reviews based on the number of stars that each review received. More specifically: • We develop a maximum entropy classifier to categorize each review from 1-star to 5-stars • We implement a set of features that we believe to be relevant to the sentiment expressed in reviews and analyze their effect on performance, providing insights into what works and why sentiment categorization can be so difficult • We analyze how a review’s conformance to a particular language model can be affected by the sentiment of the review • We experiment with different linguistically motivated models of sentiment expression, again using the results to improve the performance of our classifier • We examine the effects of part-of-speech tagging on our ability to predict sentiment. 2 Background and Related Work Initial work in the field of sentiment categorization treated the problem as one of binary classification [2, 7, 12, 13]. Turney (2002) uses a semantic orientation algorithm to classify reviews based on the numbers of positively oriented and negatively oriented phrases in each review. Pang et al. (2002) used machine learning tools such as Naïve Bayes, Maximum Entropy and Support Vector Machine (SVM) classifiers to classify movie reviews using a number of simple textual features. They found that unigram presence features give the best results, with little or no improvement gained through the additional feature sets they explored. These methods demonstrate that machine learning algorithms can be used successfully for the task of sentiment classification, at least on a binary scale. We too implement a classifier and begin with an exploration of simple features, then turn to more complex, linguistically motivated features to further improve the performance of our sentiment categorization system. In other work, Pang and Lee (2004) have shown that performance on the sentiment classification task can be improved by identifying objective sentences and removing them from reviews, leaving only subjective sentences by which to classify the review [6]. This method is particularly apt for movie reviews, as their study showed, where we must be careful not to confuse feelings toward plot or characters (“I hate the villain”) with feelings about the movie as a whole (“I hate this move”). However, problems such as this one appear to be less common in the domain of restaurant reviews. Additionally, because most reviews in our data set are relatively short, we probably would not benefit from tossing out a chunk of what little data we already have for eachreview. Although we do not use the data-trimming methods mentioned in the paper, we keep its lessons in mind. Namely, not all content in a review is equally relevant for predicting sentiment. Finally, several papers in the realm of sentiment categorization deal specifically with the task of predicting ratings on a multi-point scale [5,9]. These works make use of the notion that a 5-star review is much closer in value to a 4-star review that it is to a 1-star review. 3 The Dataset Our dataset consists of 457,023 restaurant reviews from OpenTable.com, a service for finding restaurants and making reservations online. The reviews span 11,067 restaurants. The reviews range in length from 1 to 750 characters, with an average of 248 characters per review. Because the reviews are quite short on average, we need efficient techniques that make ample use of the little available data. For the purpose of development, we use only a small portion of these reviews when implementing and testing our models. We select 5,000 reviews from the data as a test set and use an additional 15,000 reviews as a training set. For our best model, we experimented with varying amount of training data to see if we could further improve performance. As a final note, some reviews in our data set are quite short (i.e. only a few characters or less than a sentence). Because we want to focus on actually predicting the sentiment of reviews based on their linguistic content, we exclude from both training and testing any reviews that have fewer than 100 characters, which we estimate to be the length of a short sentence. 4 Sentiment Prediction Here we describe the techniques we employed in order to classify the

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford CS 224N - Sentiment Categorization of Restaurant Reviews

Sign up for free to view:

Please select your school