Middle Term ExamProjectSlide 3Logistic RegressionLogistic RegressionLogistic RegressionLogistic RegressionLogistic RegressionIllustration of Gradient DescentExample: Heart DiseaseExample: Heart DiseaseExample: Text CategorizationExample: Text CategorizationExample 2: Text ClassificationLogistic Regression vs. Naïve BayesDiscriminative vs. GenerativeOverfitting ProblemOverfitting ProblemSolution: RegularizationRegularized Logistic RegressionRegularization as Robust OptimizationSparse Solution by Lasso RegularizationSparse Solution by Lasso RegularizationBayesian TreatmentBayesian TreatmentMulti-class Logistic RegressionConditional Exponential ModelConditional Exponential ModelModified Conditional Exponential ModelMiddle Term Exam•02/28 (Thursday), take home, turn in at noon time of 02/029 (Friday)Project•03/14 (Phase 1): 10% of training data is available for algorithm development•04/04 (Phase 2): full training data and test examples are available•04/17 (submission): submit your prediction before 11:59pm Apr. 20 (Wednesday)•04/23 and 04/25: •Project presentation•Announce the competition results•04/28: project report is dueLogistic RegressionRong JinLogistic Regression•Generative models often lead to linear decision boundary•Linear discriminatory model•Directly model the linear decision boundary•w is the parameter to be decidedLogistic RegressionLogistic RegressionLearn parameter w by Maximum Likelihood Estimation (MLE)•Given training dataLogistic Regression•Convex objective function, global optimal•Gradient descentClassification errorLogistic Regression•Convex objective function, global optimal•Gradient descentClassification errorIllustration of Gradient DescentExample: Heart Disease• Input feature x: age group id• Output y: if having heart disease• y=1: having heart disease• y=-1: no heart disease1: 25-292: 30-343: 35-394: 40-445: 45-496: 50-547: 55-598: 60-64Example: Heart DiseaseExample: Text CategorizationLearn to classify text into two categories•Input d: a document, represented by a word histogram•Output y=1: +1 for political document, -1 for non-political documentExample: Text Categorization•Training dataExample 2: Text Classification• Dataset: Reuter-21578• Classification accuracy• Naïve Bayes: 77%• Logistic regression: 88%Logistic Regression vs. Naïve Bayes•Both are linear decision boundaries•Naïve Bayes: •Logistic regression: learn weights by MLE•Both can be viewed as modeling p(d|y)•Naïve Bayes: independence assumption•Logistic regression: assume an exponential family distribution for p(d|y) (a broad assumption)Discriminative vs. GenerativeDiscriminative ModelsModel P(y|x) Pros•Usually good performance Cons• Slow convergence• Expensive computation• Sensitive to noise dataGenerative ModelsModel P(x|y)Pros• Usually fast converge• Cheap computation• Robust to noise dataCons• Usually performs worseOverfitting ProblemConsider text categorization•What is the weight for a word j appears in only one training document dk?Using regularization Without regularizationIterationOverfitting ProblemDecrease in the classification accuracy of test dataSolution: RegularizationRegularized log-likelihoodThe effects of regularizer•Favor small weights•Guarantee bounded norm of w•Guarantee the unique solutionRegularized Logistic RegressionUsing regularization Without regularizationIterationClassification performance by regularizationRegularization as Robust Optimization•Assume each data point is unknown but bounded in a sphere of radius and center xiSparse Solution by Lasso RegularizationRCV1 collection: •800K documents•47K unique wordsSparse Solution by Lasso RegularizationHow to solve the optimization problem?•Subgradient descent•MinimaxBayesian Treatment•Compute the posterior distribution of w•Laplacian approximationBayesian Treatment•Laplacian approximationMulti-class Logistic Regression•How to extend logistic regression model to multi-class classification ?Conditional Exponential Model•Let classes be•Need to learn Normalization factor (partition function)Conditional Exponential Model•Learn weights ws by maximum likelihood estimation•Any problem ?Modified Conditional Exponential
View Full Document