Stanford CS 224 - Study Notes - D2598207

Home> Schools> Stanford University> Computer Science (CS) > CS 224> Study Notes

DOC PREVIEW

Stanford CS 224 - Study Notes

School name Stanford University

Course Cs 224- N Natural Language Processing with Deep Learning

Pages 14

This preview shows page 1-2-3-4-5 out of 14 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

[1] Pro, Con, and Affinity Tagging of Product Reviews Todd Sullivan Department of Computer Science Stanford University [email protected] 1 Introduction As internet use becomes more prevalent, an increasing number of consumers are turning to product reviews to guide their search for the best product. This is especially true for prod-ucts that the consumer may not have extensive knowledge about, such as cameras and GPS devices. According to a 2006 eTailing Group and JC Williams Consultancy study, 70% of online shoppers examine customer reviews be-fore they buy and 92% find customer reviews to be "very helpful" in making a decision, which makes product reviews an important facet of consumer decision making and an im-portant aspect of retailing and e-commerce. While plain text reviews are helpful, most consumers do not have enough time to read many reviews about multiple products. As a result, reviews are much more helpful if they contain tags for the pros and cons. Most con-sumers would agree that reviews are even more helpful if the product's main page that contains all of its reviews contains the aggre-gate information of these tags. In this paper, we consider the task of as-signing pros, cons, and affinities (more on these later) to product reviews given informa-tion such as the review's title, comment text, and rating. We begin in Section 2 with back-ground research in extracting pros and cons from reviews and describe our problem defini-tion in Section 3. We then present our baseline systems that use a bag of word approach with a Naïve Bayes classifier (Section 4). In Sec-tion 4 we also examine the implications of various preprocessing techniques such as re-moving punctuation, lowercasing words, and stemming. In Section 5 we present a maxi-mum entropy classifier that shows consider-able performance increases over the baseline system. We move to making joint decisions on tag sets in Section 6, and finish with a brief listing of the many ideas and techniques that we did not have sufficient time to incorporate into our systems. 2 Background There has been many successful previous works pertaining to analyzing reviews. For example, Bo Pang, Lillian Lee, and others have had much success in predicting the rat-ing, or sentiment, expressed in review text [1, 2, 3]. In their work, they found that sentiment classification was by-and-large more difficult than standard topic-based categorization, with many successful techniques in topic classifica-tion such as incorporating frequency informa-tion into n-grams actually decreasing perform-ance in the sentiment classification setting. Additionally, they found that unigram pres-ence features were the most effective (none of their other features provided consistent per-formance boosts when unigram presence fea-tures were used), and that bigrams were not effective at capturing the context. Other works, such as that of Soo-Min Kim and Eduard Hovy [4] have explored extracting pros and cons from reviews. In their work, they found that pros and cons occur in both factual and opinion sentences within reviews. Their results hint at the possibility of using different tactics to extract tags based on whether the sentence is determined to be fac-tual or an opinion. 3 Problem Definition Our setting is slightly different from previous studies. We have obtained product review data for the GPS Device category of Buzzil-lions.com, which is a product review portal[2] run by PowerReviews. PowerReviews offers customer review solutions to e-commerce businesses. Their system collects reviews from con-sumers that are verified to have purchased the product from an e-commerce store in their network. In addition to a product rating, title, and comment text, the customer is given the option to input pros, cons, and affinities as tags. Affinities are summarizations of the type of consumer that the customer is, such as the affinities "Frequent Traveler," "Bad With Di-rections," and "Technically Savvy" in the GPS Device category. The user can input their own tags for these sections as well as choose from four to ten tags that are either preselected by PowerReviews moderators or are the most popular tags in the category. Instead of extracting pro and con sen-tences from the comment text, we use the most frequently inputted tags as classes and attempt to classify the existence of each tag given the available information about the re-view. First, in Sections 4 and 5 we attempt to classify each tag independently of the others. Then in Section 6 we move to making a joint decision given the probabilities obtained from the independent classifications. We use the standard precision, recall, and balanced F-measure as performance metrics. 3.1 Datasets Our datasets include a total of 3,245 reviews. We randomly split these reviews into training, validation, and test sets with an 80%, 10%, 10% split. For all experiments we performed training on the training set, used the validation set to find optimal parameters, and present re-sults by applying the classifier with the opti-mal parameters from the validation set to the test set. We only include tags that occur at least 50 times, which amounts to 19 pros, 9 cons, and 8 affinities. Table 3.1 shows the tags and their frequencies for pros, cons, and af-finities. As one can see, the frequency of pros is much higher in comparison to cons. This causes many problems that we will discuss in later sections. Additionally, many of the tags have frequently occurring opposites, such as "Long Battery Life" and "Short Battery Life", and the affinities "Technically Challenged" vs. "Technically Savvy." We leverage these facts in Section 7 when optimizing the tag sets for each review. On the other hand, the distinction between several of the pros is quite vague. For example, "Reliable" and "Reliable Perform-ance" can be interpreted to be the same attrib-ute, as well as all of the pro tags that start with the word "easy." The existence of the con "None" is also a misnomer because the fact that there are no cons is not itself a con. Table 3.1: Pro, Con, and Affinity Classes Pros Frequency Class 137 Accurate Maps 1,320 Acquires

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 14 pages.

Stanford CS 224 - Study Notes

Sign up for free to view:

Please select your school