Stanford CS 224 - Extracting Strong Sentiment Trends from Twitter - D622084

Home> Schools> Stanford University> Computer Science (CS) > CS 224> Extracting Strong Sentiment Trends from Twitter

DOC PREVIEW

Stanford CS 224 - Extracting Strong Sentiment Trends from Twitter

School name Stanford University

Course Cs 224- N Natural Language Processing with Deep Learning

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Extracting Strong Sentiment Trends from TwitterPatrick LaiComputer Science DepartmentStanford [email protected] is a popular real-time microblogging service that al-lows its users to share short pieces of information knownas “tweets” (limited to 140 characters). Users write tweetsto express their opinions about various topics pertaining totheir daily lives. With a total 175 million users and 95 mil-lion tweets published per day (as of September 2010), Twit-ter serves as an ideal platform for the analysis and extractionof general public sentiment regarding specific issues.The measurement of presidential performance is one do-main where the analysis and extraction of general publicsentiment is a large component. Currently, presidential ap-proval polls are hand-measured by random telephone sam-pling of a small population. This technique is both time-consuming and costly. Therefore, an automated way ofmeasuring these polls from easily accessible public datawould be immensely useful in reducing the required timeand costs.This project explores an approach to automatically ex-tract large-scale trends in the perception of presidential per-formance among the general public by analyzing tweetspublished on Twitter. Specifically, macro trends in strongapproval and strong disapproval of presidential perfor-mance are extracted from tweets using a simple lexicon-based approach. The extracted sentiments are comparedagainst a hand-measured presidential performance poll tomeasure correlation and determine whether strong politicalsentiments regarding presidential performance can be ex-tracted from Twitter. From a natural language processingperspective, this problem is interesting because ...Related WorkThere is a large collection of research around using ma-chine learning techniques for sentiment analysis in corporacontaining informal language, such as data from social-networks and microblogging services. Pang et al. wereone of the first to apply sentiment analysis to online moviereviews [2]. Their findings showed that machine learningtechniques, specifically support vector machines, are quitegood at detecting sentiment in movie reviews when com-pared to human-generated baselines. Research by Go et al.brought sentiment analysis to the Twitter domain by apply-ing similar machine learning techniques to classifying thesentiment of tweets [1]. Their primary contribution was anapproach using emoticons as noisy labels during the train-ing process, eliminating the need for hand-labeled data.More recently, there have been several research projectsthat apply sentiment analysis to Twitter corpora in orderto extract general public opinion regarding political issues.These projects moved away from using traditional machinelearning techniques and instead employed lexicon-basedapproaches which used sentiment lexicons to determineword polarity. It should be noted that many of the sentimentlexicons used in these projects are not tailored towards thetype of language used in social media.Tumasjan et al. showed that Twitter does indeed providea platform for political deliberation [7]. In addition, usingthe LIWC sentiment lexicon they showed that sentimentextraction with word counts produced results that closelymatch traditional election polls.The work of O’Connor et al. found that both con-sumer confidence polls and political sentiment polls corre-late with sentiment measures computed using word frequen-cies in tweets. They used the Subjectivity Lexicon fromOpinionFinder to label tweets as containing positive sen-timent or negative sentiment and correlated the results tohand-measured polls. Although this project builds on andclosely resembles their work, the approach developed hereis unique in that it doesn’t only attempt to extract sentimentpolarity but also sentiment strength (i.e. strong approval vs.strong disapproval). This project also differs from the workof O’Connor et al. in that it explores the implementationof improvements suggested in their work, such as the useof part-of-speech information and emoticons in extractingtweet sentiment, and the use of a sentiment lexicon tailoredtowards text originating from social media.Data SetsThis section discusses the data sets used in this project. Acollection of tweets is used as the primary corpus for anal-ysis. A sentiment lexicon is used to determine word senti-1ment (both polarity and strength). Finally, hand-measuredpresidential approval data is used as a gold standard com-parison point to determine the correlation of the approachto widely accepted polls.Twitter CorpusA collection of 457,981,476 tweets is used for analysis inthis project. This data was collected by polling the Twit-ter API over a six month period from July 2009 throughDecember 2009. Each record in this data set contains theuser’s Twitter profile address, the time the tweet was pub-lished and the actual tweet body.Sentiment LexiconThe Subjectivity Lexicon from OpinionFinder contains alist of 8,221 words (2,718 positive words, 4,912 nega-tive words and 591 neutral words) with their polarity andstrength [9]. Additional fields include part-of-speech andwhether a word is in stemmed form. Much of the sentimentclues from this lexicon were obtained from a wide varietyof formal language news sources. This is the sentiment lex-icon used by O’Connor et al.Since the Subjectivity Lexicon is not very well selectedfor short and informal text, an alternate sentiment lexiconthat is better suited for social media is explored. The Sen-tiStrength lexicon contains a list of 891 words (374 positivewords and 517 negative words) with their polarity, strength,and whether a word is in stemmed form [4]. The sentimentclues and stemming rules from this lexicon were obtainedfrom MySpace, a social-networking service with a demo-graphic similar to Twitter, and thus better suited for use withinformal text.Presidential ApprovalThe Daily Presidential Tracking Poll published by Ras-mussen Reports provides daily ratings for strong approvaland strong disapproval of presidential performance. Thepoll is hand-measured via telephone surveys of 500 likelyvoters per night and reported on a three-day rolling averagebasis.Text AnalysisA two step approach was taken to perform sentiment anal-ysis on tweets. The first step was to select tweets about thetopic of interest, in this case presidential performance, fromthe corpus. There are many ways to achieve this, thougha simple technique is used here.

View Full Document