Outline Generative Models CS4780 Machine Learning Fall 2009 Thorsten Joachims Cornell University Reading Mitchell Chapter 6 9 6 10 Duda Hart Stork Pages 20 27 Bayes Decision Rule Assumption learning task P X Y is know Bayes decision rule Bayes theorem Generative vs discriminative learning Two generative learning algorithms naive Bayes linear discriminant analysis Selecting hypotheses based on training data maximum likelihood maximum a posteriori MAP bayesian Generative vs Discriminative Models Process Generator Generate descriptions according to distribution P X Teacher Assigns a value to each description based on P Y X Question Given instance x how should it be classified to minimize prediction error Bayes Decision Rule Bayes Theorem It is possible to switch conditioning according to the following rule Given any two random variables X and Y it holds that Training Examples Discriminative Model Select classification rules H to consider hypothesis space Find h from H with lowest training error Argument low training error leads to low prediction error Examples SVM decision trees Perceptron Generative Model Select set of distributions to consider for modeling P X Y Find distribution that best matches P X Y on training data Argument if match close enough we can use Bayes Decision rule Examples naive Bayes HMM Na ve Bayes Classifier Multivariate Model for each class Prior probabilities Note that Classification rule 1 Estimating the Parameters of Na ve Bayes Count frequencies in training data n number of training examples n n number of pos neg examples Xi xi y number of times feature Xi takes value xi for examples in class y Xi number of values attribute Xi can take Estimating P Y Fraction of positive negative examples in training data Linear Discriminant Analysis Spherical Gaussian model with unit variance for each class Prior probabilities Classification rule Estimating P X Y Maximum Likelihood Estimate Smoothing with Laplace estimate Often called Rocchio Algorithm in Inform Retrieval Estimating the Parameters of LDA Count frequencies in training data Na ve Bayes Classifier Multinomial Application Text classification training data n number of training examples n n number of positive negative training examples Estimating P Y Fraction of positive negative examples in training data Assumption l words in document Estimating class means Classification Rule Estimating the Parameters of Na ve Bayes Count frequencies in training data n number of training examples n n number of pos neg examples W wi y number of times word wi occurs in examples of class y l l total number of words in pos neg examples V size of vocabulary Estimating P Y Test Collections Estimating P X Y Smoothing with Laplace estimate Reuters 21578 Reuters newswire articles classified by topic 90 categories multi label 9603 training documents 3299 test documents ModApte 27 000 features WebKB Collection WWW pages classified by function e g personal HP project HP 4 categories multi class 4183 training documents 226 test documents 38 000 features Ohsumed MeSH Medical abstracts classified by subject heading 20 categories from disease subtree multi label 10 000 training documents 10 000 test documents 38 000 features 2 Example Reuters Article Multi Label Example Ohsumed Abstract Categories COFFEE CRUDE KENYAN ECONOMY FACES PROBLEMS PRESIDENT SAYS The Kenyan economy is heading for difficult times after a boom last year and the country must tighten its belt to prevent the balance of payments swinging too far into deficit President Daniel Arap Moi said In a speech at the state opening of parliament Moi said high coffee prices and cheap oil in 1986 led to economic growth of five pct compared with 4 1 pct in 1985 The same factors produced a two billion shilling balance of payments surplus and inflation fell to 5 6 pct from 10 7 pct in 1985 he added But both these factors are no longer in our favour As a result we cannot expect an increase in foreign exchange reserves during the year he said Categories Animal Blood Proteins Metabolism DNA Drug Effects Mycotoxins Toxicity Representing Text as Attribute Vectors Multi Class Multi Label Cannot learn multi label rules directly Most classifiers assume that each document is in exactly one class Many classifiers can only learn binary classification rules Most common solution Multi Label Learn one binary classifier for each label Attach all labels for which some classifier says positive Most common solution Multi Class Learn one binary classifier for each label Put example into the class with the highest probability or some approximation thereof Ignore ordering of words Experimental Results Performance Measures Precision Recall Break Even Point Intersection of PR curve with the identity line Macro averaging First compute the measure then compute average Results in average over tasks Micro averaging First average the elements of the contingency table then compute the measure Results in average over each individual classification decision Reuters Newswire 90 categories 9603 training doc 3299 test doc 27000 features WebKB Collection 4 categories 4183 training doc 226 test doc 38000 features Ohsumed MeSH 20 categories 10000 training doc 10000 test doc 38000 features microaveraged precision recall breakeven point 0 100 Naive Bayes Reuters WebKB Ohsumed 72 3 82 0 62 4 Rocchio Algorithm LDA 79 9 74 1 61 5 C4 5 Decision Tree 79 4 79 1 56 7 k Nearest Neighbors 82 6 80 5 63 4 SVM 87 5 90 3 71 6 3 Comparison of Methods for Text Classification Na ve Bayes Rocchio TDIDT k NN LDA C4 5 SVM Simplicity conceptual Efficiency at training Efficiency at prediction Handling many classes Theoretical validity o Prediction accuracy o Stability and robustness 4
View Full Document
Unlocking...