DOC PREVIEW
Stanford CS 224 - Literary Style Classification with Deep Linguistic Features

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1IntroductionApproachFeaturesFeaturesFeature SelectionImplementationResultResultResultConclusionLiterary Style Classification with Deep Linguistic FeaturesHyung Jin KimMinjong ChungWonhong LeeIntroductionPeople in the same professional area share similar literacy style. Is there any way to classify them automatically?Basic IdeaEntertainer (Britney Spears)“I-I-I Wanna Go-o-o by the amazing”Politician (Barack Obama)“Instead of subsidizing yesterday’s energy, let’s invest in tomorrow’s.”IT Guru (Guy Kawasaki)“Exporting 20,439 photos to try to merge two computers. God help me.”3 different professions on TwitterApproachWe concentrated on extracting as many features as possible from sentence, and selecting the most efficient features from them.ClassifiersSupport Vector Machine (SVM)Naïve Bayes (NB)FeaturesBasic Features-Binary value for each word => Large dimension of feature space-Word Stemming & Removing Stopwords-Used TF-IDF WeightingSyntactic Features-POS Tag-Using specific classes of POS tag (e.g. only nouns or only verbs, etc.)FeaturesType of features ExamplePunctuation marks “awesome!!!!!”Capitalization “EVERYONE”Dates or Years “Mar 3,1833”Number or Rates “10% growth”Emoticons “ cool :) “Retweet (Twitter Specific) “RT @mjipeo”Manual Features-Limitation of performance with using only automatically collected features-Manually selected features by human intelligence!Feature SelectionTF-IDF (Term Frequency – Inverse Document Frequency)-Usually used as a importance measure of a term in a documentInformation Gain-Tried to measure how we can reduce the uncertainty of labels if we know a word (or feature) Chi-square-Tried to measure dependency between the feature and the classIG(Y | X) =H(Y) - H(Y | X)ImplementationCaching System-Quite amount of time to process-Devised our own caching system to enhance the productivity NLP Libraries-Stanford POS Tagger-JWI WordNet Package developed by MIT for basic dictionary operationsResultPerformance of Various Feature Extractors on SVMSVM with TF-IDF selection: 60%SVM with IG selection: 58%SVM with Chi-square: 52%ResultSVM with manual selection: 62%Performance of Manual Feature Extractors on SVMResultNV without feature selection: 84%Random guess (benchmark): 33%Performance of Classifiers (Naïve Bayes vs. SVM)Conclusion-Naïve Bayes Classifier works surprisingly well without any feature engineering due to appropriate independence assumption-For SVM, selecting proper features is critical to avoid overfitting-Using only noun features works better than


View Full Document

Stanford CS 224 - Literary Style Classification with Deep Linguistic Features

Documents in this Course
Load more
Download Literary Style Classification with Deep Linguistic Features
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Literary Style Classification with Deep Linguistic Features and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Literary Style Classification with Deep Linguistic Features 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?