DOC PREVIEW
Purdue CS 59000 - Statistical Machine learning

This preview shows page 1-2-3-4-5-6-7-48-49-50-51-52-53-54-97-98-99-100-101-102-103 out of 103 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 103 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Statistical Machine learning Lecture 25Outline General EM AlgorithmEM as Lower Bounding MethodsLower BoundIllustration of Lower Bound Lower Bound Perspective of EMSequential DataMarkov ModelsState Space ModelsHidden Markov ModelsFrom Mixture Models to HMMsHMMs Samples from HMMInference: Forward-backward AlgorithmViterbi Algorithm ReviewExampleGaussian Processes for ClassificationSample from GP PriorPredictive DistributionLaplace’s method for GP Classification (1)Laplace’s method for GP Classification (2)Laplace’s method for GP Classification (3)Predictive DistributionExampleSupport Vector MachinesMaximizing Margin Optimization ProblemLagrange MultiplierGeometrical Illustration of Lagrange MultiplierLagrange Multiplier with Inequality ConstraintsKarush-Kuhn-Tucker (KKT) conditionLagrange Function for SVMDual VariablesDual ProblemPredictionKKT Condition and Support VectorsSolving Bias TermComputational ComplexityExample: SVM ClassificationClassification for Overlapping ClassesNew Cost Function Lagrange FunctionKKT ConditionGradientsDual Lagrangian Dual Lagrangian with Constraints Support Vectors Solve Bias TermInterpretation from Regularization FrameworkRegularized Logistic RegressionVisualization of Hinge Error FunctionSVM for RegressionЄ-insensitive Error FunctionSlack VariablesVisualization of SVM RegressionBayesian NetworksBayesian Curve Fitting Bayesian Curve Fitting —LearningGenerative ModelsParameterized Conditional DistributionsLinear-Gaussian ModelsConditional IndependenceConditional Independence: Example 1Conditional Independence: Example 2Conditional Independence: Example 3D-separationThe Markov BlanketJoint DistributionConverting Directed to Undirected Graphs (1)Converting Directed to Undirected Graphs (2)Inference in Graphical ModelsInference on a ChainInference on a ChainInference on a ChainFactor GraphsFactor Graphs from Directed GraphsFactor Graphs from Undirected GraphsThe Sum-Product Algorithm (1)The Sum-Product Algorithm (2)Message from factor to the variableMessage from variable to factor InitializationThe Sum-Product Algorithm (8)The Junction Tree AlgorithmLoopy Belief PropagationThe Max-Sum AlgorithmThe Max-Sum AlgorithmThe Max-Sum AlgorithmRelated algorithms Unsupervised Learning K-means Clustering: GoalCost FunctionStochastic Online ClusteringK-medoids AlgorithmMixture of GaussiansConditional ProbabilityMaximum LikelihoodIdentifiabilityMaximum Likelihood Conditions (1)Maximum Likelihood Conditions (2)Expectation Maximization for Mixture GaussiansStatistical Machine learningLecture 25Yuan (Alan) QiOutline• Review of EM• Hidden Markov models•Review of course content (after the midterm)General EM AlgorithmEM as Lower Bounding MethodsGoal: maximize Define:We haveLower Boundis a functional of the distribution .Since and ,is a lower bound of the log likelihood function .Illustration of Lower BoundLower Bound Perspective of EM• Expectation Step:Maximizing the functional lower bound over the distribution = p(Z|X, ).• Maximization Step:Maximizing the lower bound over the parameters .Sequential DataThere are temporal dependence between data pointsMarkov ModelsBy chain rule, a joint distribution can be re-written as:Assume conditional independence, we haveIt is known as first-order Markov chainState Space ModelsImportant graphical models for many dynamic models, includes Hidden Markov Models (HMMs) and linear dynamic systemsHidden Markov ModelsMany applications, e.g., speech recognition, natural language processing, handwriting recognition, bio-sequence analysisFrom Mixture Models to HMMsBy turning a mixture Model into a dynamic model, we obtain the HMM. Let model the dependence between two consecutive latent variables by a transition probability:HMMsPrior on initial latent variable:Emission probabilities:Joint distribution:Samples from HMM(a) Contours of constant probability density for the emission distributions corresponding to each of the three states of the latent variable. (b) A sample of 50 points drawn from the hidden Markov model, with lines connecting the successive observations.Inference: Forward-backward AlgorithmGoal: compute marginals for latent variables.Forward-backward Algorithm: exact inference as special case of sum-product algorithm on the HMM.Factor graph representation (grouping emission density and transition probability in one factor at a time):Viterbi AlgorithmViterbi Algorithm: • Finding the most probable sequence of states• Special case of sum-product algorithm on HMM.What if we want to find the most probable individual states?ReviewGP classificationSVMsLangrage multipliersBayesian networksMarkov random fieldsThe sum-product/max-sum algorithmK-meansMixture of Gaussians and EMExamplet = sin(2 π x1)x2= x1+nx3= eGaussian Processes for ClassificationLikelihood:GP Prior:Covariance function:Sample from GP PriorPredictive DistributionNo analytical solution.Approximate this integration:Laplace’s methodVariational BayesExpectation propagationLaplace’s method for GP Classification (1)Laplace’s method for GP Classification (2)Log probability of the joint modelLaplace’s method for GP Classification (3)Newton-Raphson update:Predictive DistributionExampleSupport Vector MachinesSupport Vector Machines: motivated by statistical learning theory.Maximum margin classifiersMargin: the smallest distance between the decision boundary and any of the samplesMaximizing Margin Since scaling w and b together will not change the above ratio, we setIn the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.Optimization ProblemQuadratic programming:Subject toLagrange MultiplierMaximizeSubject toGradient of constraint:Geometrical Illustration of Lagrange MultiplierLagrange Multiplier with Inequality ConstraintsKarush-Kuhn-Tucker (KKT) conditionLagrange Function for SVMQuadratic programming:Subject to Lagrange function:Dual VariablesSetting derivatives of over and to zero, we obtain the following two equations:Dual ProblemPredictionKKT Condition and Support VectorsIn the later case, we call the corresponding data points support vectors.Solving Bias TermFor support vectors:Computational ComplexityQuadratic programming:When Dimension < Number of data points, Solving the Dual problem is more costly.Dual representation allows the use of kernelsExample: SVM


View Full Document

Purdue CS 59000 - Statistical Machine learning

Documents in this Course
Lecture 4

Lecture 4

42 pages

Lecture 6

Lecture 6

38 pages

Load more
Download Statistical Machine learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Statistical Machine learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Statistical Machine learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?