DOC PREVIEW
Stanford CS 224 - Study Notes

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

[1] CS 224n Final Project: A Semantic, Supervised Classification Approach to Restaurant Reviews Pavani Vantimitta [email protected] Abstract The rapid growth of E-commerce has made Customer reviews an indispensable source of informa-tion for both the potential buyer and the seller. Reviews act as a quick means of assessing the prod-uct value for the buyer and the customer feedback for the seller. Product reviews have been mined continuously for the past fifteen years or more to make them a more tractable source for users. Re-views on a product range from hundreds to thousands and make it difficult to manage an overall as-sessment of the product itself. This paper aims at providing an overall rating to a restaurant based on reviews collected using the semantics of the review. 1 Introduction Product reviews are available in abundance on any product freely and easily accessible on the web. This does not provide a conclusive solution to a user looking to buy a product merely due to sheer quantity and the varied opinions available. Previous work in this area, has attempted var-ious directions to address this problem. Sentiment classification has been initially studied as a cognitive linguistic problem. Work by Hearst [1] proposes a metaphoric model to determine the directionality of texts. This directionality of information is achieved by using a manually-constructed. Pang et al [2] investigates the use of several supervised machine learning methods to semantically classify movie reviews. Unsupervised learning methods have been used for se-mantic orientation classification like in Turney [3]. It relies on the computation of mutual infor-mation between review phrases and the words “excellent” and “poor”. Methods other than machine learning can also be applied to classify reviews. Like Subasic and Huettner [4] use fuzzy techniques applicable to fuzzy sets to construct a lexicon and is used to analyze documents. Liu, et al [5] build linguistic affect models for six basic emotions by uti-lizing relationships from the Open Mind Common Sense (OMCS) knowledge base and manually specified ground truth. An affect sensing engine is then built to judge the affect of given passag-es. Hu and Liu [6] use the adjective synonym sets and antonym sets in WordNet [7] to judge se-mantic orientations of adjectives. They extend a seed set of adjectives by searching synonyms and antonyms in WordNet. This paper aims to provide a conclusive rating based on a mix of the above related works to each restaurant that is reviewed in the data that was collected. Section 2 states the data collected and the various preprocessing methods employed. Section 3 talks about Part-of-Speech tagging used to extract semantics of the review. Section 4 explains the various ways in which a vocabu-lary of words was created. Section 5 elucidates the various classifiers used to classify the re-views. Section 6 is exclusively devoted to the Maximum Entropy Classifier used. Each section contains results outlining the performance.[2] 2 Data Collection and Preprocessing 2.1 Data Set The data set required for this exercise was to be already labeled with a rating beforehand. The in-itial idea was the use google base to obtain the reviews in xml files. It turned out that using google base’s API (in this case Java) uses the “snippets” fees which consists of partial data. And this means that the review text is not complete. Thus this idea had to be dropped. Using the google base search (GUI) pulled up reviews based from yelp.com and thus retrieving them di-rectly from there seemed a better choice. The three data sets: training, validation and testing where collected from yelp.com with “Palo Alto” and “Restaurants” as search strings. The web pages retrieved from here were then crawled over using Web-Harvest [8] which is a web data extraction tool. It leverages well proved XML and text processing technologies in order to easily extract useful data from arbitrary web pages. A configuration file was written to extract the required data fields from the web pages and saved to a text file. The training set consists of 61 businesses and the validation and test set of 20 busi-nesses each. The training set has a total of 1071 reviews, the validation set has 341 reviews and the test set has 260 reviews. <Business> and <Reviews> are two data handles that I created to handle the data sets. Each restaurant reviewed is considered to be <Business> which consisted of one <Reviews> data structure that in-turn internally consisted of the reviews. Business Categories hold the Category under which the cuisine of the restaurant falls under. Business Tags are some business details written by the customer or the reviewer about some of the facilities available at the restaurant (e.g. wheelchair accessibility or parking etc.) Review Opinions consists of the opinions users have left about the review, i.e. whether some prospective customer read the review and found it useful or cool or funny. These are the three opinions al-lowed by Yelp.com. Review Text contains the whole review comments left by the user as one whole String. Review Text words contains a list of the words that appear in each review. Review Text tags is used to fill up with the POS tags later on. <BUSINESS> <REVIEWS> BUSINESS NAME REVIEW OPINIONS BUSINESS ADDRESS REVIEW TEXT BUSINESS URL REVIEW RATING BUSINESS CATEGORIES REVIEW TEXT WORDS BUSINESS TAGS REVIEW TEXT TAGS Table 2.1 <Business> and <Reviews> data structures with their data items. 2.2 Preprocessing Some basic preprocessing methods were used to do away with some unnecessary characters that were created due to the web extraction that was done. For example, characters like “”. Extra spaces and tabs between words were removed. Some data fields that were unnecessary were also[3] done away with in this stage. All punctuation marks were replaced with a space in between the words and then the punctuation marks. For example, “doing?” was replaced with “doing ?”. This was done to enable the Part-of-Speech tagger used later on to be able to successfully retrieve the correct tags without the punctuation marks causing errors. 3 Part-of-Speech Tagging Stanford Log-linear Part-Of-Speech Tagger is the POS Tagger used to tag the reviews. The tag-ger is the main source of semantic information of the review. It


View Full Document

Stanford CS 224 - Study Notes

Documents in this Course
Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?