Stanford CS 224n - Extracting Common Sentiments From Reviews - D1072401

Home> Schools> Stanford University> Computer Science (CS) > CS 224n> Extracting Common Sentiments From Reviews

Stanford CS 224n - Extracting Common Sentiments From Reviews

Course Cs 224n- Natural Language Processing with Deep Learning

Pages 13

Download Save

Unformatted text preview:

Extracting Common Sentiments From ReviewsProposed SystemDataClassifiers and FeaturesGeneration 0: Bag of Words and a Naive ClassifierGeneration 1: Stems of Words and a Sophisticated ClassifierGeneration 2: Increasing the Vocabulary Size (N)Generation 3: Integrating Parser FeaturesGenerations SummaryInformative FeaturesCross-Domain ApplicabilityClusteringDesign ConsiderationsThe WEKA library provides us with several clustering algorithms, from simple K-means to EM. As we will discuss in the cluster quality section below, in addition to clustering our application requires a method to measure the quality of individual clusters against an application-specific metric, both for choosing an optimal number of clusters and for ranking the resulting clusters. Unfortunately, it proved impractical to get this kind of information from WEKA. We attempted to use a feature of the EM clusterer that optimized the number of clusters, K, by maximizing the probability of generating them, but it tended to pick K much too small, resulting in extremely large clusters of mostly unrelated opinions.Feature AnalysisCluster Quality OptimizationFinal Results And Future WorkReferencesExtracting Common Sentiments From ReviewsA large and growing body of user-generated reviews is available on the Internet, fromproduct reviews at sites like Amazon.com to restaurant reviews at sites like Yelp.com. Forusers making a purchasing or dining decision, the opinions of others can be an importantfactor. Although some aggregate information -- like average star ratings -- for multiplereviews is sometimes available, in general the only way to get a sense of the overallsentiment among users is by reading through many reviews. As the number of reviews for asingle product or restaurant becomes large (on the order of hundreds or even thousands), itbecomes increasingly impractical to read every review.We view the goal of reading multiple reviews as finding widely-held opinions and weighingthe positive against the negative, and we wish to automate this sort of task using NLP andmachine-learning techniques. The problem can be broken down into three majorcomponents: sentence-level sentiment classification; sentiment clustering and ranking; andsummarization. Sentiment-classification involves labeling every sentence in every review fora particular restaurant as either Subjective-Positive, Subjective-Negative, or Objective. Arange of literature exists on this problem. Pang et al. describe the successful use oftraditional machine-learning techniques, such as Naive-Bayes, for the sentimentclassification of entire movie reviews. [1] Hu and Liu propose a system to solve a verysimilar problem to the one we pose. [2] Instead of simple classification, they approach theproblem by first extracting opinion words from each sentence and then predicting thepolarity of the sentence by the dominant polarity of its constituents. They grow sets ofpositive and negative opinion words using seed words in WordNet. Given the success ofPang et al. with simple classification techniques, we plan to take this approach, exploringvarious feature sets and classifiers. After isolating subjective sentences from objectivesentences, we will cluster those subjective sentences that are closely-related using a simpleK-means algorithm and rank the resulting clusters using a cluster-quality metric thatrewards large, cohesive clusters.In the remainder of this paper we will present our proposed system, justify various designdecisions, and discuss the performance of the final system.Proposed SystemThe figure below illustrates the multiple stages of processing that result in a ranked-list ofclosely-related opinions expressed in a set of reviews.DataWe use two sets of data in this project. First, for training our subjectivity and polarityclassifiers, we take advantage of publicly available movie review data [3]. Specifically weuse the "Subjectivity dataset V 1.0" [4] and the "Sentence Polarity dataset V 1.0" [5]. Thesubjectivity dataset consists of 5,000 objective and 5,000 subjective sentences. Thesubjective sentences come from Rotten Tomatoes [6] reviews, while the objective sentencescome from IMDB [7] plot summary snippets. The training data are not perfect: Plotsummaries may contain subjective opinions, and sometimes reviews contain objectiveinformation (though the latter is less likely). The objectivty dataset consists of 5,331positive and 5,331 negative sentences. The sentences come from Rotten Tomatoes "Fresh"and "Rotten" reviews respectively.The data we wish to extract opinions from were collected from Yelp.com. In total there are1731 reviews of 6 restaurants, with each review containing an average of 11 sentences.Classifiers and FeaturesWe experimented with both Naive Bayes and SVM classifiers, using a number of differentfeatures. The following shows how we carefully chose the best combinations throughmultiple design iterations with intermediate error analysis.Generation 0: Bag of Words and a Naive ClassifierOur initial approach is to treat each sentence as a bag of words. We represent a sentenceusing a vector whose entries correspond to the TF-IDF-weighted frequency of each of thewords in the vocabulary. We use TF-IDF weighting to get a better representation of theimportance of each word. It also obviates the need for the use of stopwords, which mighthave been detrimental to the performance of the classifier. To keep the dimensionality ofthe resulting vectors manageable we limit our vocabulary to the N most-frequently-occurring words.Our initial classifier is Naive Bayes (NB). Its performance is decent for subjectivityclassification but rather poor for polarity classification. We present our initial results below.Results| Classifier: NB. Features: Bag of words. Vocabulary size (N):Approximately 1000 per class.Classification Task Train Accuracy (%)Accuracy (%)[5-fold CV]Subjectivity 82.08 81.65Polarity 63.9655 62.9244Generation 1: Stems of Words and a Sophisticated ClassifierWe reason that using stems instead of words should have a two-fold beneficial role to ourproblem. First, it will decrease the sparseness in the data, since there are fewer distinctstems compared to distinct words. Second, it should be able to capture and group bettersemantic information. To that end we employ the Snowball [8] algorithm and generate bagsof stems instead of bags of words. We still use TF-IDF when constructing stem vectors.Moving from words to stems increases subjectivity

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford CS 224n - Extracting Common Sentiments From Reviews

Sign up for free to view:

Please select your school