DOC PREVIEW
Purdue CS 59000 - Lecture 6

This preview shows page 1-2-3-18-19-36-37-38 out of 38 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 59000 Statistical Machine learningLecture 6Alan QiOutlineML and Bayesian estimation of Gaussian distributionst-distributions and mixture of GaussiansExponential familyBayes’ Theorem for Gaussian VariablesGivenwe havewhereMaximum Likelihood for the Gaussian (1)Given i.i.d. data , the log likeli-hood function is given bySufficient statisticsMaximum Likelihood for the Gaussian (2)Set the derivative of the log likelihood function to zero,and solve to obtainSimilarlyMaximum Likelihood for the Gaussian (3)Under the true distributionHence define Is it biased?Contribution of the Nthdata point, xNSequential Estimationcorrection given xNcorrection weightold estimateBayesian Inference for the Gaussian (1)Assume ¾2is known. Given i.i.d. data, the likelihood function for¹ is given byThis has a Gaussian shape as a function of ¹ (but it is not a distribution over ¹).Bayesian Inference for the Gaussian (2)Combined with a Gaussian prior over ¹,this gives the posteriorCompleting the square over ¹, we see thatBayesian Inference for the Gaussian (3)… whereNote:Bayesian Inference for the Gaussian (4)Example: for N = 0, 1, 2 and 10.Data points are sampled from a Gaussian of mean 0.8 & variance 0.1Bayesian Inference for the Gaussian (5)Sequential EstimationThe posterior obtained after observing N { 1data points becomes the prior when we observe the Nthdata point.Bayesian Inference for the Gaussian (6)Now assume ¹ is known. The likelihood function for ¸ = 1/¾2is given byThis has a Gamma shape as a function of ¸.Bayesian Inference for the Gaussian (7)The Gamma distributionBayesian Inference for the Gaussian (8)Now we combine a Gamma prior, ,with the likelihood function for ¸ to obtainwhich we recognize as withBayesian Inference for the Gaussian (9)If both ¹ and ¸ are unknown, the joint likelihood function is given byWe need a prior with the same functional dependence on ¹ and ¸.Bayesian Inference for the Gaussian (10)The Gaussian-gamma distribution• Quadratic in ¹.• Linear in ¸.• Gamma distribution over ¸.• Independent of ¹.Bayesian Inference for the Gaussian (11)The Gaussian-gamma distributionBayesian Inference for the Gaussian (12)Multivariate conjugate priors• ¹ unknown, ¤ known: p(¹) Gaussian.• ¤ unknown, ¹ known: p(¤) Wishart,• ¤ and ¹ unknown: p(¹,¤) Gaussian-Wishart,Student’s t-Distribution (1)If we integrate out the precision of a Gaussian with a Gamma prior, we obtainSetting and , we haveStudent’s t-Distribution (2)Student’s t-Distribution (3)Robustness to outliers: Gaussian vs t-distribution.Student’s t-Distribution (4)The D-variate case:where .Properties:Mixtures of Gaussians (1)Old Faithful data setSingle Gaussian Mixture of two GaussiansMixtures of Gaussians (2)Combine simple models into a complex model:ComponentMixing coefficientK=3Mixtures of Gaussians (3)Mixtures of Gaussians (4)Determining parameters ¹, §, and ¼ using maximum log likelihoodSolution: use standard, iterative, numeric optimization methods or the expectation maximization algorithm (Chapter 9). Log of a sum; no closed form maximum.The Exponential Family (1)where ´ is the natural parameter andso g(´) can be interpreted as a normalization coefficient.The Exponential Family (2.1)The Bernoulli DistributionComparing with the general form we see thatand soLogistic sigmoidThe Exponential Family (2.2)The Bernoulli distribution can hence be written asWhereReminder:The Exponential Family (3.1)The Multinomial Distributionwhere, , andNOTE: The ´kparameters are not independent since the corresponding ¹kmust satisfyThe Exponential Family (3.2)Let . This leads toand Here the ´kparameters are independent. Note thatandSoftmaxThe Exponential Family (3.3)The Multinomial distribution can then be written as whereThe Exponential Family (4)The Gaussian DistributionwhereML for the Exponential Family (1)From the definition of g(´) we getThusML for the Exponential Family (2)Give a data set, , the likelihood function is given by Thus we have Sufficient statisticConjugate priorsFor any member of the exponential family, there exists a priorCombining with the likelihood function, we getPrior corresponds to º pseudo-observations with value Â.Posterior of Gaussian mean


View Full Document

Purdue CS 59000 - Lecture 6

Documents in this Course
Lecture 4

Lecture 4

42 pages

Load more
Download Lecture 6
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 6 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 6 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?