Bayes Classifier, Linear RegressionClassification and RegressionSupervised ClassificationClassificationGenerative Classifier: Bayes ClassifierGenerative learning: Naïve BayesNaïve BayesFull Bayes vs. Naïve BayesRegression1-parameter linear regressionMultivariate linear regressionConstant term?The constant termRegression: another exampleBayes Classifier, Linear Regression10701/15781 RecitationJanuary 29, 2008Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.Classification and RegressionClassificationGoal: Learn the underlying functionf: X (features) Y (class, or category) e.g. words “spam”, or “not spam” Regressionf: X (features) Y (continuous values) e.g. GPA salarySupervised ClassificationHow to find an unknown functionf: X Y (features class)or equivalently P(Y|X)Classifier: 1. Find P(X|Y), P(Y), and use Bayes rule - generative2. Find P(Y|X) directly - discriminative)|(maxarg)( xXkYPxfkClassificationLearn P(Y|X)1. Bayes rule:P(Y|X) = P(X|Y)P(Y) / P(X) ~ P(X|Y)P(Y)Learn P(X|Y), P(Y)“Generative” classifier2. Learn P(Y|X) directly“Discriminative” (to be covered later in class)e.g. logistic regressionGenerative Classifier: Bayes ClassifierLearn P(X|Y), P(Y)e.g. email classification problem3 classes for Y = { spam, not spam, maybe }10,000 binary features for X = {“Cash”, “Rolex”,…}How many parameters do we have?P(Y) :P(X|Y) :Generative learning:Naïve BayesIntroduce conditional independenceP(X1 ,X2 |Y) = P(X1 |Y) P(X2 |Y)P(Y|X) = P(X|Y) P(Y) / P(X) for X=(Xi,…,Xn) = P(X1|Y)…P(Xn|Y) P(Y) / P(X) = prodi P(Xi|Y) P(Y) / P(X)Learn P(X1|Y), … P(Xn|Y), P(Y) instead of learning P(X1,…, Xn |Y) directlyNaïve Bayes3 classes for Y = {spam, not spam, maybe}10,000 binary features for X = {“Cash”,”Rolex”,…}Now, how many parameters?P(Y)P(X|Y) fewer parameters “simpler” – less likely to overfitFull Bayes vs. Naïve BayesXORX1 X2 Y1 0 10 1 11 1 00 0 0P(Y=1|(X1,X2)=(0,1))=?Full Bayes:P(Y=1)=?P((X1,X2)=(0,1)|Y=1)=?Naïve Bayes: P(Y=1)=?P((X1,X2)=(0,1)|Y=1)=?RegressionPrediction of continuous variablese.g. I want to predict salaries from GPA. I can regress that …Learn the mapping f: X YModel is linear in the parameters (+ some noise) linear regressionAssume Gaussian noiseLearn MLE Θiiixhxf )()(1-parameter linear regressionNormal linear regression or equivalently, MLE Θ ? MLE σ2 ? XY),0(~2N),(~2XNYMultivariate linear regressionWhat if the inputs are vectors?Write matrix X and Y :(n data points, k features for each data)MLE Θ =nnknnkkyyyxxxxxxxxx21212222111211,.........y......x......x......xxn21)()(1YXXXTT Constant term?We may expect linear data that does not go through the originTrick?The constant termRegression: another example Assume the following model to fit the data. The model has one unknown parameter θ to be learned from data.A maximum likelihood estimation of θ?)1),(log(~
View Full Document