Purdue CS 59000 - Statistical Machine learning - D623709

Home> Schools> Purdue University> Computer Sciences (CS) > CS 59000> Statistical Machine learning

DOC PREVIEW

Purdue CS 59000 - Statistical Machine learning

School name Purdue University

Course Cs 59000- Topics in Computer Sciences

Pages 27

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 59000 Statistical Machine learningLecture 12Yuan (Alan) QiOutline• Review of Laplace approximation, BIC, Bayesian logistic regression • Kernel methods• Kernel ridge regression• Kernel construction• Kernel principle component analysisLaplace Approximation for PosteriorGaussian approximation around mode:Evidence ApproximationBayesian Information CriterionApproximation of Laplace approximation:More accurate evidence approximation neededBayesian Logistic RegressionKernel MethodsPredictions are linear combinations of a kernel function evaluated at training data points.Kernel function <-> feature space mappingLinear kernel:Stationary kernels:Fast Evaluation of Inner Product of Feature Mappings by Kernel FunctionsInner product needs computing six feature values and 3 x 3 = 9 multiplicationsKernel function has 2 multiplications and a squaringKernel Trick1. Reformulate an algorithm such that input vector enters only in the form of inner product . 2. Replace input x by its feature mapping:3. Replace the inner product by a Kernel function:Examples: Kernel PCA, Kernel Fisher discriminant, Support Vector MachinesDual variables:Dual Representation for Ridge RegressionKernel Ridge RegressionUsing kernel trick:Now the cost function depends on input only through the Gram matrix.Kernel Ridge RegressionEquivalent cost function over dual variables:Minimize over dual variables:Constructing Kernel functionExample: Gaussian kernelConsider Gaussian kernel:Why is it a valid kernel?Example: Gaussian kernelConsider Gaussian kernel:Why is it a valid kernel?Generalization:Combining Generative & Discriminative Models by KernelsSince each modeling approach has distinct advantages, how to combine them?• Use generative models to construct kernels • Use these kernels in discriminative approachesMeasure Probability Similarity by Kernels Simple inner product:For mixture distribution:For infinite mixture models:For models with latent variables (e.g,. Hidden Markov Models:)Fisher KernelsFisher Score:Fisher Information Matrix:Fisher Kernel:Sample Average:Principle Component Analysis (PCA)Assume We haveis a normalized eigenvector:Feature MappingEigen-problem in feature spaceDual VariablesSuppose (why it cannot be smaller than 0?), we haveEigen-problem in Feature Space (1)Multiplying both sides by , we obtainEigen-problem in Feature Space (2)Normalization condition:Projection coefficient:General Case for Non-zero Mean CaseKernel Matrix:Kernel PCA on Synthetic DataContour plots of projection coefficients in feature spaceLimitations of Kernel PCADiscussion…Limitations of Kernel PCAIf N is big, it is computationally expensive since K is N by N while S is D by D.Not easy for low-rank

View Full Document