Stanford CS 224 - Naive Bayes - D3032158

Home> Schools> Stanford University> Computer Science (CS) > CS 224> Naive Bayes

DOC PREVIEW

Stanford CS 224 - Naive Bayes

School name Stanford University

Course Cs 224- N Natural Language Processing with Deep Learning

Pages 25

This preview shows page 1-2-24-25 out of 25 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CLASSIFICATIONTRIGRAM WITH KATZ-BACKING OFF AND ABSOLUTE DISCOUNTING K-NN ALGORITHM WITH DISTINCT DISTANCE FUNCTIONSSTATEMENTREPORT SENTIMENT AND OBJECTIVITY CLASSIFICATIONXAVIER FALCO RAFI WITTEN ROBIN ZHOU CS 224 N FINAL PROJECTTASK CLASSIFICATION The goal for this assignment was to develop a technique for text classification that is robust enough to be applied across a variety of different classification challenges. The task that we planned to train on was sentiment classification (determining if a review is positive or negative), but we were interested in the question of if the same techniques that worked on sentiment classification could work on objectivity classification, or determining if a sentence describes an objective or subjective claim. The hope is that we could determine which techniques make specific use of particular properties of the languages and which are more general and could be used across arbitrary classification tasks. The ultimate goal was to develop techniques that can train for any sentiment classification task when given a suitable dataset. GOAL Given the goal of trying to create a generic classification algorithm, the question comes in what techniques to implement. Ultimately, the techniques we settled upon were Naïve Bayes, K-nearest neighbors classification and a generic machine learning technique, SVM. The plan was to try to each of them on the first task, sentiment classification and see if after optimizing them for that task, they performed satisfactorily on the second task, objectivity classification. TECHNIQUES PROPOSED The corpuses that we used were from source [1]. We used a combination of polarity_dataset2.0 and sentence polarity dataset v1.0 in sentiment classification. The polarity dataset is made exclusively from movie reviews that have been hand-labeled. The two datasets differs because one is still organized by review while the other is a sort of slop of sentiment sentences extracted from reviews. For objectivity classification we used subjectivity dataset 1.0. This second dataset was rather small, only 5000 sentences of each type, but it proved sufficient for our purposes. Again, the dataset has no clear unifying theme and more seems to be linked by whatever the author deemed interesting. USE OF CORPUS3 NAÏVE BAYES MAXIMUM LIKELIHOOD TRIGRAM WITH KATZ- BACKING OFF AND ABSOLUTE DISCOUNTING ALGORITHM Naïve Bayes is the method in which we train two different language models, one on each of the two tasks at hand. Then, to classify a text, the text’s likelihood is determined under the two language models. The text is then classified into the corpus that gave it the larger probability. The technique is rather straightforward, but it theoretically produces the optimal result assuming that the a priori probability of the two options (or theoretically n options) is equal. To see this: P(Text && Classification n) = P(Text | Classification n) P(Classification n) If all of our original P(Classification 1)’s are constant, then the highest probability for the text will be given by choosing the highest P(Text | Classification n) or by choosing the classification that gives us the highest probability of the text. In theory, this framework of the noisy channel model is the best possible, but in practice it is limited by our ability to create accurate language models. Instead, for text classification it is common to switch to a less natural model that is more powerful and is this able to make better use of the data. Language models, even trigram models, are deeply flawed by their reliance on only immediate preceding text and these flaws are what opens up the possibility of more generic machine learning techniques to surpass them. Regardless, Naïve Bayes is a very natural way of handling the problem and is a good baseline for a classification model. MODEL The model that was used was Katz Backing-off mixed with absolute discounting, built on top of a trigram model. Its direct relevance in our discussion of Naïve Bayes is rather minute, except that the performance of Naïve Bayes has a lot to do with how much smoothing is done.4 These equations are pulled from Wikipedia and completely define Katz-backing off. Because our model of Katz-backing off was based on absolute discounting, the constant d can be thought of as being the subtraction of probability mass from events based on their count. The constant d was optimized at great lengths, but ultimately it must be considered to be of little consequence. It was interesting that on the test data set for sentiment classification d was chosen to be equivalent to subtracting .15 from events that occurred once and .3 from events that occurred more than once, far less than the standard constants of .55 and .75 used for creating language models. RESULTS The data for sentiment analysis was broken into 3 chunks. The bulk was left to serve as the corpus, but 40 reviews of each type were excluded. Of these, 20 of each type were used for the numerical optimization previously mentioned and 20 of each type were used to get a final estimate. Ultimately 28/40 were correctly classified on the optimization data set while 35/40 where correctly classified on the validate data set. However, this does not exactly fill us with confidence when we examine the data – the log probabilities for the optimized data are presented below (which could be considered as the validate set). “Positive Sentiment” Score “Negative Sentiment” Score Correctly Classified Pos1 -1700 -1726 1 Pos2 -2081 -2166 1 Pos3 -2045 -2103 1 Pos4 -1556 -1628 1 Pos5 -1311 -1298 0 Pos6 -1665 -1724 1 Pos7 -1500 -1511 1 Pos8 -1437 -1442 1 Pos9 -1315 -1354 1 Pos10 -1653 -1626 0 Pos11 -1738 -1737 0 Pos12 -1293 -1279 0 Pos13 -1442 -1437 0 Pos14 -1363 -1413 1 Pos15 -1605 -1675 1 Pos16 -1158 -1207 1 Pos17 -1257 -1301 1 Pos18 -1889 -1993 1 Pos19 -1773 -1813 1 Pos20 -1859 -1774 05 Neg1 -2300 -2297 1 Neg2 -1650 -1661 0 Neg3 -2188 -2073 1 Neg4 -2151 -2074 1 Neg5 -1787 -1784 1 Neg6 -2197 -2084 1 Neg7 -1789 -1751 1 Neg8 -2151 -2052 1 Neg9 -2161 -2127 1 Neg10 -1736 -1680 1 Neg11 -2543 -2418 1 Neg12 -2405 -2358 1 Neg13 -1884 -2013 0 Neg14 -2477 -2549 0 Neg15 -2529 -2533 0 Neg16 -1761 -1736 1 Neg17 -1884 -1932 0 Neg18 -2125 -2150 0 Neg19 -2071 -1986 1 Neg20 -2296 -2283 1 28 Although we get 28 correct out of 40, there are many close calls (consider Neg5) that we got right, that these results seem

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-24-25 out of 25 pages.

Stanford CS 224 - Naive Bayes

Sign up for free to view:

Please select your school