DOC PREVIEW
Stanford CS 224 - Study Notes

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

[1] Pro, Con, and Affinity Tagging of Product Reviews Todd Sullivan Department of Computer Science Stanford University [email protected] 1 Introduction As internet use becomes more prevalent, an increasing number of consumers are turning to product reviews to guide their search for the best product. This is especially true for prod-ucts that the consumer may not have extensive knowledge about, such as cameras and GPS devices. According to a 2006 eTailing Group and JC Williams Consultancy study, 70% of online shoppers examine customer reviews be-fore they buy and 92% find customer reviews to be "very helpful" in making a decision, which makes product reviews an important facet of consumer decision making and an im-portant aspect of retailing and e-commerce. While plain text reviews are helpful, most consumers do not have enough time to read many reviews about multiple products. As a result, reviews are much more helpful if they contain tags for the pros and cons. Most con-sumers would agree that reviews are even more helpful if the product's main page that contains all of its reviews contains the aggre-gate information of these tags. In this paper, we consider the task of as-signing pros, cons, and affinities (more on these later) to product reviews given informa-tion such as the review's title, comment text, and rating. We begin in Section 2 with back-ground research in extracting pros and cons from reviews and describe our problem defini-tion in Section 3. We then present our baseline systems that use a bag of word approach with a Naïve Bayes classifier (Section 4). In Sec-tion 4 we also examine the implications of various preprocessing techniques such as re-moving punctuation, lowercasing words, and stemming. In Section 5 we present a maxi-mum entropy classifier that shows consider-able performance increases over the baseline system. We move to making joint decisions on tag sets in Section 6, and finish with a brief listing of the many ideas and techniques that we did not have sufficient time to incorporate into our systems. 2 Background There has been many successful previous works pertaining to analyzing reviews. For example, Bo Pang, Lillian Lee, and others have had much success in predicting the rat-ing, or sentiment, expressed in review text [1, 2, 3]. In their work, they found that sentiment classification was by-and-large more difficult than standard topic-based categorization, with many successful techniques in topic classifica-tion such as incorporating frequency informa-tion into n-grams actually decreasing perform-ance in the sentiment classification setting. Additionally, they found that unigram pres-ence features were the most effective (none of their other features provided consistent per-formance boosts when unigram presence fea-tures were used), and that bigrams were not effective at capturing the context. Other works, such as that of Soo-Min Kim and Eduard Hovy [4] have explored extracting pros and cons from reviews. In their work, they found that pros and cons occur in both factual and opinion sentences within reviews. Their results hint at the possibility of using different tactics to extract tags based on whether the sentence is determined to be fac-tual or an opinion. 3 Problem Definition Our setting is slightly different from previous studies. We have obtained product review data for the GPS Device category of Buzzil-lions.com, which is a product review portal[2] run by PowerReviews. PowerReviews offers customer review solutions to e-commerce businesses. Their system collects reviews from con-sumers that are verified to have purchased the product from an e-commerce store in their network. In addition to a product rating, title, and comment text, the customer is given the option to input pros, cons, and affinities as tags. Affinities are summarizations of the type of consumer that the customer is, such as the affinities "Frequent Traveler," "Bad With Di-rections," and "Technically Savvy" in the GPS Device category. The user can input their own tags for these sections as well as choose from four to ten tags that are either preselected by PowerReviews moderators or are the most popular tags in the category. Instead of extracting pro and con sen-tences from the comment text, we use the most frequently inputted tags as classes and attempt to classify the existence of each tag given the available information about the re-view. First, in Sections 4 and 5 we attempt to classify each tag independently of the others. Then in Section 6 we move to making a joint decision given the probabilities obtained from the independent classifications. We use the standard precision, recall, and balanced F-measure as performance metrics. 3.1 Datasets Our datasets include a total of 3,245 reviews. We randomly split these reviews into training, validation, and test sets with an 80%, 10%, 10% split. For all experiments we performed training on the training set, used the validation set to find optimal parameters, and present re-sults by applying the classifier with the opti-mal parameters from the validation set to the test set. We only include tags that occur at least 50 times, which amounts to 19 pros, 9 cons, and 8 affinities. Table 3.1 shows the tags and their frequencies for pros, cons, and af-finities. As one can see, the frequency of pros is much higher in comparison to cons. This causes many problems that we will discuss in later sections. Additionally, many of the tags have frequently occurring opposites, such as "Long Battery Life" and "Short Battery Life", and the affinities "Technically Challenged" vs. "Technically Savvy." We leverage these facts in Section 7 when optimizing the tag sets for each review. On the other hand, the distinction between several of the pros is quite vague. For example, "Reliable" and "Reliable Perform-ance" can be interpreted to be the same attrib-ute, as well as all of the pro tags that start with the word "easy." The existence of the con "None" is also a misnomer because the fact that there are no cons is not itself a con. Table 3.1: Pro, Con, and Affinity Classes Pros Frequency Class 137 Accurate Maps 1,320 Acquires


View Full Document

Stanford CS 224 - Study Notes

Documents in this Course
Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?