Stanford CS 224n - Sentiment Analysis of User-Generated Twitter Updates using Various Classification Techniques - D2523953

Home> Schools> Stanford University> Computer Science (CS) > CS 224n> Sentiment Analysis of User-Generated Twitter Updates using Various Classification Techniques

Stanford CS 224n - Sentiment Analysis of User-Generated Twitter Updates using Various Classification Techniques

School name Stanford University

Course Cs 224n- Natural Language Processing with Deep Learning

Pages 18

Download Save

Unformatted text preview:

Sentiment Analysis of User-Generated TwitterUpdates using Various ClassificationTechniquesRavi Parikh and Matin MovassateJune 4, 20091 IntroductionTwitter is a “micro-blogging” social networking website that has a large andrapidly growing user base. Thus, the website provides a rich bank of data inthe form of “tweets,” which are short status updates and musings from Twit-ter’s users that must be written in 140 characters or less. As an increasingly-popular platform for conveying opinions and thoughts, it seems natural tomine Twitter for potentially interesting trends regarding prominent topics inthe news or popular culture.A successful sentiment classification model based on the expansive Twit-ter data could provide unprecedented utility for businesses, political groupsand curious Internet users alike. For example, a business could gauge theeffectiveness of a recent marketing campaign by aggregating user opinion onTwitter regarding their product. A user saying “I just used [Product A]today and it JUST BLOWS HARD!” would detract from the overall senti-ment, whereas a user claiming “[Product A] is my FAVORITE product ever!”would add to the overall sentiment. Similarly, a political lobbyist can gaugethe popular opinion of a politician by calculating the sentiment of all tweetscontaining the politician’s name.Obviously, this hypothetical application would be exceedingly useful. Butto construct it, we would first need to build an accurate sentiment analyzerfor tweets, which is what this project aims to achieve. Therefore, the prob-lem we chose to tackle within natural language processing is to determinethe sentiment of a given tweet. That is, given a user-generated status update(which can not exceed 140 characters), our classification model would deter-mine whether the given tweet reflects positive opinion or negative opinion on1the user’s behalf. For instance, the tweet “I’m in florida with Jesse! i lovevacations!” would be positive, whereas the tweet “Setting up an apartmentis lame.” would be negative.Sentiment analysis in Twitter is a significantly different paradigm thanpast attempts at sentiment analysis through machine learning, providinga dramatically different data set that proposes a multitude of interestingchallenges. Notable past projects include sentiment classification of moviereviews. Twitter is different in that sentiment is conveyed in one or twosentence blurbs rather than paragraphs, leading to fewer ambiguities in theform of “This movie has [list of good characteristics over many sentences].However, it is still not worth seeing.” There are instead numerous otherdifficulties. Twitter is much more informal and less consistent in terms oflanguage, and users cover a much wider array of topics touching on manyfacets of their life than the limited rhetoric of movie reviews (e.g. a moviethey just watched, a test they’re studying for, a person they’re hanging outwith). Also, sentiment is not always as obvious when discussing human-generated status updates; many tweets are ambiguous even to a human readeras to their sentiment. Finally, a considerably large fraction of tweets conveyno sentiment whatsoever, such as linking to a news article, which providesome difficulties in data gathering, training and testing.In this paper, we apply several common machine learning techniques tothis problem, including various forms of a Naive Bayes and a MaximumEntropy Model. We attempt various optimizations as well based on erroranalysis and intuitions that are specific to the unique rhetoric and languageof Twitter.2 DataWe took advantage of a JAR archive called “jtwitter.jar”, which leveraged aversion of the Twitter API specifically for Java. We wrote a small app thatpulled queries from Twitter’s public timeline in real-time, and then thesequeries were evaluated and hand-tagged for sentiment by us. We had a totalof 370 positive tweets and 370 negative tweets that were used for trainingand testing. Of these, we randomly chose 100 of each for testing, and therest of the tweets were used for training. We considered acquiring more data,but both the Naive Bayes models as well as the MaxEnt classifier were ableto achieve high metrics with this small training set.This data set does indeed seem small, or at least small enough so thatwe would be unable to obtain preferable metrics with it. Data-gathering isperhaps the biggest issue that Twitter-based sentiment analysis poses com-2pared to more traditional problems of sentiment analysis such as movie orproduct reviews. Thus, we figured it would not be worth the significant extratime simply for a few more data points. However, as will be evident in theResults section, we were able to achieve excellent accuracy (particularly withthe Naive Bayes classifier).Each tweet is between 1 and 140 characters. We did not bother taggingtweets in foreign languages or tweets with excessive amounts of colloquialism-s/mispellings that made it difficult to decipher for even humans. Addition-ally, many tweets do not convey sentiment; for example, “just tried pesto forthe first time” is a tweet that is neither positive nor negative. Even tweets like“got promoted at work today” don’t actually convey sentiment; the authorhere is not rendering judgment on whether this is a positive development,despite our cultural notions about the nature of promotions. Instead, a tweetlike “got promoted at work today...gonna celebrate tonight!” clearly demon-strates positive sentiment. These tweets where sentiment was not conveyedwere not used in training or testing.Even with a robust Twitter API and a quick method of obtaining tweetclassifications for training data purposes, a two-person effort was not enoughto generate a truly sizable collection of positive and negative tweets (on theorder of 1000 tweets). This was because a large majority of tweets eithercontain only a link, are in a foreign language, or convey no sentiment what-soever, meaning tweets that conveyed some level of opinion or emotion weresurprisingly difficult to encounter. On average, processing 200 tweets resultedin roughly 5-10 positive tweets and 5-10 negative tweets alike, with the restexpressing no demonstrable sentiment. This is an interesting reflection onhow Twitter is used as a service, demonstrating that users turn to the sitemore to discuss events objectively rather than rendering judgment.3 Code StructureOur source code for testing and implementation

View Full Document