DOC PREVIEW
MIT 6 863J - Sentiment Analysis of Movie Review Comments

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Sentiment Analysis of Movie Review Comments6.863 Spring 2009 final projectKuat [email protected]ˇsa Misailovi´[email protected] 17, 2009AbstractThis paper presents an empirical study of efficacy of machine learning techniques inclassifying text messages by semantic meaning. We use movie review comments frompopular social network Digg as our data set and classify text by subjectivity/objectivityand negative/positive attitude. We propose different approaches in extracting text fea-tures such as bag-of-words model, using large movie reviews corpus, restricting to adjec-tives and adverbs, handling negations, bounding word frequencies by a threshold, andusing WordNet synonyms knowledge. We evaluate their effect on accuracy of four ma-chine learning methods - Naive Bayes, Decision Trees, Maximum-Entropy, and K-Meansclustering. We conclude our study with explanation of observed trends in accuracy ratesand providing directions for future work.1 IntroductionInternet today contains a huge quantity of textual data, which is growing every day. The textis prevalent data format on the web, since it is easy to generate and publish. What is hardnowadays is not availability of useful information but rather extracting it in the proper contextfrom the the vast ocean of content. It is now beyond human power and time to seed throughit manually; therefore, the research problem of automatic categorization and organizing datais apparent.Textual information can be divided into two main domains: facts and opinions. Whilefacts focus on objective data transmission, the opinions express the sentiment of their au-thors. Initially, the research has mostly focused on the categorization of the factual data.Today, we have web search engines which enable search based on the keywords that de-scribe the topic of the text. The search for one keyword can return a large number of pages.For example, Google search for the word “startrek” finds more than 2.3 million pages. Thesearticles include both objective facts about the movie franchise (e.g. Wikipedia article) andsubjective opinions from the users (e.g. review from critics).In recent years, we became witnesses of a large number of websites that enable usersto contribute, modify, and grade the content. Users have an opportunity to express theirpersonal opinion about specific topics. The examples of such web sites include blogs, forums,product review sites, and social networks.Opinion can be expressed in different forms. One example may be web sites for review-ing products, such as Amazon [1], or movie review sites such as RottenTomatoes [4] which1enable rating of products, usually on some fixed scale as well as leaving personal reviews.These reviews tend to be longer, usually consisting of a few paragraphs of text. With respectto their length and comprehensiveness they tend to resemble blog messages. Other type ofweb sites contain prevalently short comments, like status messages on social networks likeTwitter [5], or article reviews on Digg [2]. Additionally many web sites allow rating the pop-ularity of the messages (either binary thumbs up/thumbs down or finer grained star rating),which can be related to the opinion expressed by the author.Sentiment analysis aims to uncover the attitude of the author on a particular topic fromthe written text. Other terms used to denote this research area include “opinion mining”and “subjectivity detection”. It uses natural language processing and machine learning tech-niques to find statistical and/or linguistic patterns in the text that reveal attitudes. It hasgained popularity in recent years due to its immediate applicability in business environ-ment, such as summarizing feedback from the product reviews, discovering collaborativerecommendations, or assisting in election campaigns.The focus of our project is the analysis of the sentiments in the short web site comments.We expect the short comment to express succinctly and directly author’s opinion on certaintopic. We focus on two important properties of text:1. subjectivity – whether the style of the sentence is subjective or objective;2. polarity – whether the author expresses positive or negative opinion.We use statistical methods to capture the elements of subjective style and the sentencepolarity. Statistical analysis is done on the sentence level. We apply machine learning tech-niques to classify set of messages.We are interested in the following questions:1. To what extent can we extract the subjectivity and polarity from the short comments?What are the important features that can be extracted from the raw text that have thegreatest influence on the classification?2. What machine learning techniques are suitable for this purpose? We compare in totalfour techniques of supervised and unsupervised learning.3. Are the properties of short messages important for sentiment analysis similar to theproperties of some existing corpus? We compare our manually annotated corpora tothe larger existing corpusWe present the analysis on manually annotated examples from Digg. We describe theexperiments and interpret the results.2 MethodologyOur method of sentiment analysis is based upon machine learning. We explain what sourcesof data we used in 2.1, how we selected features in 2.2, and how we performed classificationin 2.3.22.1 SourcesWe chose the domain of social web site comment messages. We obtained the commentsfrom articles posted on Digg. Digg [2] is a social networking web site which enables itsusers to submit links and recommend the content from other web sites. Digg has a votingsystem which allows users to vote for (+1) or against (-1) posted items and leave commentson posts. The total sum of diggs, that is the difference between thumbs up votes and thumbsdown votes, represents the popularity of the post. Besides popularity, which is assigned byother users, there is no clue about the sentiment of the author of the messages.We have chosen two relatively popular posts from Digg. Both articles share the theme;they are about movie reviews of recent blockbuster movies:1. http://digg.com/movies/Quantum_of_Solace_disappoints : a review of newJames Bond movie “Quantum of Solace” (684 diggs);2. http://digg.com/movies/Star_Trek_The_best_prequel_ever : a review of“Star Trek” movie (669 diggs.)We have retrieved all comments from these posts and stored them in the original formatin files qos.txt and startrek.txt.The reason we have chosen movie


View Full Document

MIT 6 863J - Sentiment Analysis of Movie Review Comments

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Sentiment Analysis of Movie Review Comments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sentiment Analysis of Movie Review Comments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sentiment Analysis of Movie Review Comments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?