CS 59000 Statistical machine learningLecture 12Alan QiOutlineReview of Bayesian factor and empirical BayesianFisher’s linear discriminantPeceptronGenerative models for classificationBayesian Model ComparisonSuppose we want to compare models .Given a training set , we computeModel evidence (also known as marginal likelihood):Bayes factor:Evidence and Parameter PosteriorMarginal likelihood and evidenceParameter posterior distribution and evidenceCrude Evidence ApproximationAssume posterior distribution is centered around its mode andEvidence penalizes over-complex modelsGiven M parametersMaximizing evidence leads to a natural trade-off between data fitting & model complexity.Evidence Approximation & Empirical BayesApproximating the predictive distribution by maximizing marginal likelihood. Where hyperparameters maximize the evidence .Known as Empirical Bayes or type II maximum likelihoodModel Evidence and Cross-ValidationRoot-mean-square error Model evidenceFitting polynomial regression modelsClassification ApproachesDiscriminant functions: Directly assigns an input vector in a specific class Probabilistic generative models: Model the data generation process and use Bayes rule.Probabilistic discriminative models: Model the class-conditional densities directly.Distance from to decision surfaceHint:Fisher’s Linear Discriminantfind projection to a line s.t. samples from different classes are well separated.Figures from Srihari, http://www.cedar.buffalo.edu/~srihari/A naïve choice of separation measureProblem of Naïve Separation CriterionScatter of Data in Each ClassSolution: Normalization by ScatterFisher Linear DiscriminantCost FunctionWithin Class and Between Class Scatter MatricesGenerative eigenvalue problemMaximizeDifferentiating J(v) with respect to vFisher’s Linear DiscriminantExampleProjection that maximizes mean separation FLD ProjectionPerceptronGeneralized Linear ModelMinimizewhere M denotes the set of all misclassified patternsStochastic Gradient DescentProbabilistic Generative ModelsGaussian Class-Conditional DensitiesConditional densities of data:The posterior distribution for label/class:Maximum Likelihood EstimationLetLikelihood functionMaximum Likelihood
View Full Document