Stanford CS 224 - Sentiment Analysis - D2203080

Home> Schools> Stanford University> Computer Science (CS) > CS 224> Sentiment Analysis

DOC PREVIEW

Stanford CS 224 - Sentiment Analysis

School name Stanford University

Course Cs 224- N Natural Language Processing with Deep Learning

Pages 19

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1 Julie Kane Ahkter (kanej) Steven Soria (ssoriajr) Sentiment Analysis: Facebook Status Messages Final Project CS224N Abstract While recent NLP-based sentiment analysis has centered around Twitter and product/service reviews, we believe it is possible to more accurately classify the emotion in Facebook status messages due to their nature. Facebook status messages are more succinct than reviews, and are easier to classify than tweets because their ability to contain more characters allows for better writing and a more accurate portrayal of emotions. We analyze the suitability of various approaches to Facebook status messages by comparing the performance of a Maximum Entropy (“MaxEnt”) classifier, a MaxEnt classifier augmented with Labeled-LDA (“LDA”) data, a MaxEnt classifier augmented with part-of-speech (“POS”) tagging, and a MaxEnt classifier augmented with both LDA and POS data. We classify both binary and multi-class sentiment labeling. In both cases, a MaxEnt classifier augmented with POS data performs the best, achieving an average binary classification F1 score of approximately 85% and an average multi-class F1 score of approximately 67%. Introduction/Background With the transition from forum- and blog-based Internet communication among users to social networking sites such as Facebook and Twitter, there exists new opportunity for improved information mining via NLP sentiment analysis. Prior success at such analysis has been elusive due to the inherent difficulty in extracting a singleton sentiment label from long passages, where fluctuations over the document length make classification difficult. While sentiment analysis of Twitter data has surged in recent years, it too is problematic for the opposite reason: tweets are limited to 140 characters. Also, tweets often use heavy abbreviation and are more likely to be fragmented expressions, making it difficult to use a part-of-speech (“POS”) tagger. This project focuses on classifying sentiment of Facebook status updates (“status updates”) using binary and multi-class labels. Facebook makes a distinction between a Facebook user’s status update, versus links to a news article or other source of information, versus comments that are a response to another Facebook user. Unlike tweets, status updates can use up to 420 characters. Thus, status updates more often are written in mostly sentence-like structures that can benefit from POS analysis. Why perform sentiment analysis on Facebook status updates? According to a January 2010 article on InsideFacebook.com, users spent nearly 7 hours per person on Facebook in December 2009, far higher than the other top 10 parent companies on the Internet. From a marketing standpoint, understanding user sentiment as it relates to a topic of interest clearly allows more effective ad-targeting. If a user is trending positively or negatively about health care reform, appropriate political party ads might appear sympathetic to a user’s viewpoint. Similarly, a user that tends to write playful status messages might be shown ads for a local comedy club. Beyond the obvious marketing appeal, however, is a more sociologically compelling rationale: modeling the ebb and flow of consumer sentiment across a broad swath of topics, hierarchically encapsulated by user, locally-defined user communities, regionally, nationally, and globally. To our knowledge, there exists no prior work that focuses exclusively on sentiment analysis of Facebook status messages. Nonetheless, we found (Ramage, Dumais, Liebling)’s work on augmenting a MaxEnt classifier with LDA data particularly insightful. While their work2 classifies tweets according to topic rather than sentiment, via an approach that extrapolates this from latent topics among the words in a tweet, the promise that a labeled LDA could combine the best of supervised learning while still discovering features in an unsupervised fashion seemed like a promising approach for modeling the inherent complexities of sentiment. Due to time constraints for this project, we chose to let the LDA tell us the ratio of words more closely associated with a fixed set of pre-labeled sentiments. (O’Connor, Balasubramanyan, Routledge, Smith)’s was the first paper we read that shared a similar vision of NLP work of providing insight into public opinion by modeling sentiment within various societal clusters. Importantly, however, their work performs binary labeling against a pre-defined corpus of known positive/negative words. Our work aims to learn sentiment non-deterministically, training on nothing more than the sentence and a label for the sentence as a whole. Data To collect data, we created and registered a Facebook Connect application, called iFeel. Allowing the app to require specific Facebook privileges allowed us to host the app from our stanford.edu accounts, making it available to anyone, not just our friends, and allowing us to gather status updates quickly and efficiently. iFeel is a PHP-based app that uses the newer Facebook Graph APIs, which simply use URL-based queries to return result sets such as status updates. Around iFeel, we wrapped an HTML-based front-end that disclosed privacy and disclaimer information such as the period of retention of data, the usage of data, anonymity of data, project scope and purpose, et cetera. At the time of this writing, iFeel lives at http://www.stanford.edu/~ssoriajr, although this is subject to removal at any time. iFeel collects the 25 most recent status updates from a logged-on Facebook user and their friends. Since Facebook users may have intersecting sets of friends, we performed a unique sort after merging all status updates to eliminate repeated status messages. Over a period of one week, we collected 62,202 unique, raw (unlabeled) Facebook status messages. After collecting the data, the next step involved preprocessing the raw Facebook data, converting it into labeled data and breaking it up into training and test sets. For this project, we used the Stanford Classifier v2.0 (the MaxEnt classifier), Stanford Tagger v3.0 (2010-05-26, English-only) (the POS tagger), and the Stanford Topic Modeling Toolbox v0.2.1 (the Labeled LDA). All of these tools are available from http://www-nlp.stanford.edu/software/. Some massaging of the data between stages was required. Our high-level sequence involved the following: 1) Raw data collection via iFeel 2) Sentiment

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Stanford CS 224 - Sentiment Analysis

Sign up for free to view:

Please select your school