CS 59000 Statistical machine learningLecture 13Alan QiOutlineReview of Fisher’s linear discriminantPeceptronGenerative models for classificationConditional classification models:Logistic regressionProbit regressionGeneralized linear modelsDistance from to decision surfaceHint:Fisher’s Linear Discriminantfind projection to a line s.t. samples from different classes are well separated.Figures from Srihari, http://www.cedar.buffalo.edu/~srihari/Solution: Normalization by ScatterFisher Linear DiscriminantCost FunctionWithin Class and Between Class Scatter MatricesGenerative eigenvalue problemMaximizeDifferentiating J(v) with respect to vFisher’s Linear DiscriminantExampleProjection that maximizes mean separation FLD ProjectionPerceptronGeneralized Linear ModelMinimizewhere M denotes the set of all misclassified patternsStochastic Gradient DescentProbabilistic Generative ModelsGaussian Class-Conditional DensitiesConditional densities of data:The posterior distribution for label/class:Maximum Likelihood EstimationRelated to Fisher’s linear discriminantDiscrete featuresNaïve Bayes classification:Probabilistic Discriminative ModelsInstead of modelingModel directlyGenerative vs Condition ModelsDiscussionLogistic RegressionLetLikelihood functionMaximum Likelihood EstimationNote thatPlease derive the gradient after the class.Newton-Raphson Optimization for Linear RegressionLet H denote Hessian matrix It converges in one iteration for linear regression.Newton-Raphson Optimization for Logistic Regression Gradient and Hessian of the error function:Newton-Raphson Optimization for Logistic RegressionIterative reweighted least squares (IRLS):Solving a series of weighted least-square problemsFrom generative models to logistic regressionFor Naïve Bayes classification:Probit RegressionProbit function:Labeling Noise ModelRobust to outliers and labeling errorsLaplace Approximation for PosteriorGaussian approximation around mode:Illustration of Laplace ApproximationEvidence ApproximationBayesian Information CriterionApproximation of Laplace approximation:More accurate evidence approximation neededBayesian Logistic RegressionGeneralized Linear Models & Exponential FamilyGeneralized linear models:Generalized Linear ModelsGeneralized linear model:Activation function:Link function:Canonical Link FunctionIf we choose the canonical link function:Gradient of the error function reduces
View Full Document