DOC PREVIEW
Purdue CS 59000 - Statistical machine learning

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 59000 Statistical machine learning Lecture 15Outline Laplace Approximation for PosteriorEvidence ApproximationBayesian Logistic RegressionKernel MethodsFast Evaluation of Inner Product of Feature Mappings by Kernel FunctionsFlexible Function in Input Space Gram MatrixKernel TrickDual Representation for Ridge RegressionKernel Ridge RegressionKernel Ridge RegressionConstructing Kernel functionCombining Generative & Discriminative Models by KernelsMeasure Probability Similarity by Kernels Principle Component Analysis (PCA)Feature MappingDual VariablesEigen-problem in Feature Space (1)Eigen-problem in Feature Space (2)General Case for Non-zero Mean CaseKernel PCA on Synthetic DataLimitations of Kernel PCALimitations of Kernel PCACS 59000 Statistical machine learningLecture 15Alan QiOutlineReview of Laplace approximation and Bayesian logistic regressionKernel methodsKernel ridge regressionKernel PCALaplace Approximation for PosteriorGaussian approximation around mode:What value shall we assign to z0?Evidence ApproximationBayesian Logistic RegressionKernel MethodsPredictions are linear combinations of a kernel function evaluated at training data points.Kernel function <-> feature space mappingLinear kernel:Stationary kernels:Fast Evaluation of Inner Product of Feature Mappings by Kernel FunctionsInner product needs computing six feature values and 3 x 3 = 9 multiplicationsKernel function has 2 multiplications and a squaringFlexible Function in Input Space Input space Feature spaceGram MatrixThe Gram matrix contains the kernel function evaluations on N data points. A necessary and sufficient condition for a function to be valid kernel is that Gram matrix is positive semidefinite for all possible choices of the set .Kernel Trick1. Reformulate an algorithm such that input vector enters only in the form of inner product . 2. Replace input x by its feature mapping:3. Replace the inner product by a Kernel function:Examples: Kernel PCA, Kernel Fisher discriminant, Support Vector MachinesDual variables:Dual Representation for Ridge RegressionKernel Ridge RegressionUsing kernel trick:Now the cost function depends on input only through the Gram matrix.Kernel Ridge RegressionEquivalent cost function over dual variables:Minimize over dual variables:Constructing Kernel functionCombining Generative & Discriminative Models by KernelsSince each modeling approach has distinct advantages, how to combine them?• Use generative models to construct kernels • Use these kernels in discriminative approachesMeasure Probability Similarity by Kernels Simple inner product:For mixture distribution:For infinite mixture models:For models with latent variables (e.g,. Hidden Markov Models:)Principle Component Analysis (PCA)Assume We haveis a normalized eigenvector:Feature MappingEigen-problem in feature spaceDual VariablesSuppose (why it cannot be smaller than 0?), we haveEigen-problem in Feature Space (1)Multiplying both sides by , we obtainEigen-problem in Feature Space (2)Normalization condition:Projection coefficient:General Case for Non-zero Mean CaseKernel Matrix:Kernel PCA on Synthetic DataContour plots of projection coefficients in feature spaceLimitations of Kernel PCADiscussion…Limitations of Kernel PCAIf N is big, it is computationally expensive since K is N by N while S is D by D.Not easy for low-rank


View Full Document

Purdue CS 59000 - Statistical machine learning

Documents in this Course
Lecture 4

Lecture 4

42 pages

Lecture 6

Lecture 6

38 pages

Load more
Download Statistical machine learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Statistical machine learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Statistical machine learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?