Purdue CS 59000 - Statistical machine learning - D932057

Home> Schools> Purdue University> Computer Sciences (CS) > CS 59000> Statistical machine learning

DOC PREVIEW

Purdue CS 59000 - Statistical machine learning

School name Purdue University

Course Cs 59000- Topics in Computer Sciences

Pages 25

This preview shows page 1-2-24-25 out of 25 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 59000 Statistical machine learning Lecture 15Outline Laplace Approximation for PosteriorEvidence ApproximationBayesian Logistic RegressionKernel MethodsFast Evaluation of Inner Product of Feature Mappings by Kernel FunctionsFlexible Function in Input Space Gram MatrixKernel TrickDual Representation for Ridge RegressionKernel Ridge RegressionKernel Ridge RegressionConstructing Kernel functionCombining Generative & Discriminative Models by KernelsMeasure Probability Similarity by Kernels Principle Component Analysis (PCA)Feature MappingDual VariablesEigen-problem in Feature Space (1)Eigen-problem in Feature Space (2)General Case for Non-zero Mean CaseKernel PCA on Synthetic DataLimitations of Kernel PCALimitations of Kernel PCACS 59000 Statistical machine learningLecture 15Alan QiOutlineReview of Laplace approximation and Bayesian logistic regressionKernel methodsKernel ridge regressionKernel PCALaplace Approximation for PosteriorGaussian approximation around mode:What value shall we assign to z0?Evidence ApproximationBayesian Logistic RegressionKernel MethodsPredictions are linear combinations of a kernel function evaluated at training data points.Kernel function <-> feature space mappingLinear kernel:Stationary kernels:Fast Evaluation of Inner Product of Feature Mappings by Kernel FunctionsInner product needs computing six feature values and 3 x 3 = 9 multiplicationsKernel function has 2 multiplications and a squaringFlexible Function in Input Space Input space Feature spaceGram MatrixThe Gram matrix contains the kernel function evaluations on N data points. A necessary and sufficient condition for a function to be valid kernel is that Gram matrix is positive semidefinite for all possible choices of the set .Kernel Trick1. Reformulate an algorithm such that input vector enters only in the form of inner product . 2. Replace input x by its feature mapping:3. Replace the inner product by a Kernel function:Examples: Kernel PCA, Kernel Fisher discriminant, Support Vector MachinesDual variables:Dual Representation for Ridge RegressionKernel Ridge RegressionUsing kernel trick:Now the cost function depends on input only through the Gram matrix.Kernel Ridge RegressionEquivalent cost function over dual variables:Minimize over dual variables:Constructing Kernel functionCombining Generative & Discriminative Models by KernelsSince each modeling approach has distinct advantages, how to combine them?• Use generative models to construct kernels • Use these kernels in discriminative approachesMeasure Probability Similarity by Kernels Simple inner product:For mixture distribution:For infinite mixture models:For models with latent variables (e.g,. Hidden Markov Models:)Principle Component Analysis (PCA)Assume We haveis a normalized eigenvector:Feature MappingEigen-problem in feature spaceDual VariablesSuppose (why it cannot be smaller than 0?), we haveEigen-problem in Feature Space (1)Multiplying both sides by , we obtainEigen-problem in Feature Space (2)Normalization condition:Projection coefficient:General Case for Non-zero Mean CaseKernel Matrix:Kernel PCA on Synthetic DataContour plots of projection coefficients in feature spaceLimitations of Kernel PCADiscussion…Limitations of Kernel PCAIf N is big, it is computationally expensive since K is N by N while S is D by D.Not easy for low-rank

View Full Document