DOC PREVIEW
Stanford CS 224 - Study Notes

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

RJ Walsh CS224N – Final Project Sentiment Analysis of Stanford Course Reviews Introduction The goal of this project is to investigate the sentiment of Stanford course reviews. Reviews are a common element of the modern web, and determining the sentiment of a review offers great potential into analysis of the items being reviewed. In the domain of Stanford courses, this could offer professors, students, and administrators the ability to qualify course reviews on a new level; giving everyone a new method of determining how students feel about the courses they take. This paper investigates various classification methods to attempt to solve this problem. Data The data used for this project was a corpus of 4247 course reviews obtained from CourseRank (http://courserank.stanford.edu). The reviews were obtained as a raw dump from the database, with no additional information given. Almost all the reviews obtained were easily classified, since the purpose of a course review is primarily to convey sentiment or information about a course. The data was tagged for sentiment manually but a number of student volunteers. Each review was tagged as one of three classes: positive, negative, or neutral. The student volunteers tagged over 1,000 reviews, resulting in 540 positive reviews, 255 neutral reviews, and 205 negative reviews. This data set was divided into a test set and training set in an 80/20 split. This split meant that the training set was composed of 433 positive, 214 neutral, and 153 negative reviews. The test data set was composed of 107 positive, 41 neutral, and 52 negative reviews. One initial observation of this data is small, but noteworthy. It is satisfying to note that in general, most reviews tend to be positive or neutral, with only a small percent being negative. Thus, we can conclude that overall, Stanford students enjoy their courses. Since some reviews were classified as neutral, adding a third class, experiments with each classifier were conducted with and without these reviews. See the results sections for each classifier below for a discussion on how this impacted the results. Related Work Sentiment analysis is a topic that has been investigated in a variety of forms. Two papers that I used as references were “Movie Review Mining and Summarization” by Zhuang, Jing, and Zhu, and “Thumbs Up?Sentiment Classification using Machine Learning Techniques” by Pang, Lee, and Vaithyanathan. Both papers investigate sentiment classification in the domain of movie reviews, which offers suggestions on how to translate the prior work to the domain of Stanford course reviews. Movie reviews often discuss the technical aspects of a movie, ranging from the performance of a specific actor to the technical implementation of the film. Both papers attempt to identify these features in the review in an attempt to improve the classification task. I will later discuss attempts I made to translate these concepts to the domain of course reviews. For instance, where a particular actor might be a great indicator of sentiment in a course review, a professor or TA might be a fantastic indicator of course review sentiment. Identifying these indicators of sentiment was the primary goal of my investigation. Codebase For the project, I utilized a number of engineering resources. I borrowed heavily from the code I wrote for Programming Assignment 3, modifying the MaxEnt classifier I built for that assignment to create a version which can classify entire reviews instead of single words. I also reworked the existing classifier to build a Naïve Bayes Classifier smoothed using add- smoothing. I also tested the data set using the Stanford Classifier, in an attempt to compare the features I engineered to the generic features of the Stanford Classifier. Text Pre-processing As discussed in Pang and Lee’s “Thumbs Up” paper, there are some steps that can improve overall results. The most notable is the processing of NOT. For instance, without this processing step, the classifier assigned the review “Not recommended” to the positive class, because of the high association between “recommended” and the positive class. However, the presence of not inverts this sentiment, and thus we need to train the classifier differently on it. In addition to pre-processing NOT tokens, removing punctuation also improved the F-score for the classifiers. Though punctuation is often a great indicator of sentiment, it generally cluttered the classification by fragmenting tokens unnecessarily. Maximum Entropy (custom) Description This classifier implements the same model used in PA3. The starter code from that assignment was modified to classify sentences to String labels. The major changes occurred in the loadData, transformData, and extractFeatures methods. These changes ranged from simply removing the List<> structures to deal with sentences, to reworking the feature set for the domain of course reviews. The classifier itself adheres to the same properties of a maximum entropy classifier. The probability that a review will be assigned a class is given by:( ) (∑ ( ) )∑ (∑ ( ) ) Where the s are the weights learned by the classifier, and the s are binary functions, returning 1 if the review contains the feature, and 0 otherwise. The objective function of the classifier is given by ( ) ∑ ( ( )) ∑ Which the maximum entropy classifier attempts to minimize, that is, maximize the total entropy of the classifier. Results Data Set Positive Negative Neutral F1 Positive/Negative Only 0.8879 0.6538 N/A 0.7531 All Classes 0.7757 0.4878 0.4615 0.3037 Features There were a number of features added to the classifier for the task of classifying reviews. These are described below, along with the relative success of each feature on the positive/negative only data set. Each feature is added on a per-word basis, with two per-sentence features added at the bottom of the table. The incremental effect is calculated by determining the F1-score with only that feature removed. Feature Description F1 Incremental F1 WORD The word itself (baseline feature) 0.7531 N/A LENGTH The length of the current word 0.7133 0.0398 GRADE Whether the current word matches a regular expression for a grade, namely [ABCDF][+-]? 0.7465 0.0066 PREFIX The first three letters of a word 0.6603 0.0928 SUFFIX The last three letters of a word 0.6994 0.0537 ALLCAPS Whether the word


View Full Document

Stanford CS 224 - Study Notes

Documents in this Course
Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?