Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Classifying Movie Scripts by GenreAlex BlackstockMatt Spitz6/9/08Overview•Motivation•classifying movie scripts may identify box office flops and successes before they're even produced!•Data•freely-available movie scripts (DailyScripts.com, etc)•IMDB genres (several labels/movie)•Tools•Lucene•MEMM from PA3•jBNC (naïve Bayes classifier)•Stanford Named Entity Recognizer•Stanford Part-Of-Speech TaggerProcessing ScriptsFeatures•Non-NLP•dialogue shape•character information•NLP•POS ratios•Named Entity appearances•Character-Based NLP•analyze individual characters•exclamations•main vs. secondaryEvaluation Metrics•Example output:•Blade II (gold labels: Action, Thriller, Horror)•guessed labels: Action, Adventure, Horror, Thriller, ...•F1 Score•per genre•weighted-average over all genres•# of guesses allowed = # of gold labels•Partial Credit Score•allows for some error•# guesses allowed = # of gold labels * 1.5•penalized for guesses that are beyond # gold labels, but still get pointsConclusions•Success!•best feature set: basic NLP & POS tagging•PC Score: 0.601•F1 Score: 0.551•Classifier comparison (jBNC)•N-way classification problem•22 genres•average of 3.02 genres/datum•Dataset
View Full Document