DOC PREVIEW
Purdue CS 59000 - Lecture notes

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 59000 Statistical Machine learningLecture 11Yuan (Alan) QiOutline• Review of logistic regression, probitregression, generalized linear models• Laplace approximation and BIC• Bayesian logistic regression • Kernel methodsProbabilistic Discriminative ModelsInstead of modelingModel directlyLogistic RegressionLetLikelihood functionMaximum Likelihood EstimationNote thatNewton-Raphson Optimization for Logistic Regression Gradient and Hessian of the error function:Newton-Raphson Optimization for Logistic RegressionIterative reweighted least squares (IRLS):Solving a series of weighted least-square problemsFrom generative models to logistic modelsGenerative models <-> Logistic regressionHow about other discriminative functions?Probit RegressionProbit function:Labeling Noise ModelRobust to outliers and labeling errorsGeneralized Linear ModelsGeneralized linear model:Activation function:Link function:Canonical Link FunctionIf we choose the canonical link function:Gradient of the error function:ExamplesLaplace Approximation for PosteriorGaussian approximation around mode:Illustration of Laplace ApproximationEvidence ApproximationBayesian Information CriterionApproximation of Laplace approximation:More accurate evidence approximation neededBayesian Logistic RegressionKernel MethodsPredictions are linear combinations of a kernel function evaluated at training data points.Kernel function <-> feature space mappingLinear kernel:Stationary kernels:Fast Evaluation of Inner Product of Feature Mappings by Kernel FunctionsInner product needs computing six feature values and 3 x 3 = 9 multiplicationsKernel function has 2 multiplications and a squaringFlexible Function in Input SpaceGram MatrixThe Gram matrix contains the kernel function evaluations on N data points. A necessary and sufficient condition for a function to be valid kernel is that Gram matrix is positive semidefinite for all possible choices of the set .Constructing Kernel functionConstructing Kernel functionWhy?Example: Gaussian kernelConsider Gaussian kernel:Why it is a valid


View Full Document

Purdue CS 59000 - Lecture notes

Documents in this Course
Lecture 4

Lecture 4

42 pages

Lecture 6

Lecture 6

38 pages

Load more
Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?