CS 59000 Statistical Machine learningLecture 5September 2009OutlineReview of ML estimation and Bayesian treatment of Gaussian distributionst-distributions and mixture of GaussiansExponential familyBayesian Inference for the Gaussian (1)Assume ¾2is known. Given i.i.d. data, the likelihood function for¹ is given byThis has a Gaussian shape as a function of ¹ (but it is not a distribution over ¹).Bayesian Inference for the Gaussian (2)Combined with a Gaussian prior over ¹,this gives the posteriorCompleting the square over ¹, we see thatBayesian Inference for the Gaussian (4)Example: for N = 0, 1, 2 and 10.Data points are sampled from a Gaussian of mean 0.8 & variance 0.1Bayesian Inference for the Gaussian (6)Now assume ¹ is known. The likelihood function for ¸ = 1/¾2is given byThis has a Gamma shape as a function of ¸.Bayesian Inference for the Gaussian (8)Now we combine a Gamma prior, ,with the likelihood function for ¸ to obtainwhich we recognize as withBayesian Inference for the Gaussian (9)If both ¹ and ¸ are unknown, the joint likelihood function is given byWe need a prior with the same functional dependence on ¹ and ¸.Bayesian Inference for the Gaussian (10)The Gaussian-gamma distribution• Quadratic in ¹.• Linear in ¸.• Gamma distribution over ¸.• Independent of ¹.Student’s t-Distribution (1)If we integrate out the precision of a Gaussian with a Gamma prior, we obtainSetting and , we haveStudent’s t-Distribution (2)Student’s t-DistributionRobustness to outliers: Gaussian vs t-distribution.Student’s t-Distribution (3)The D-variate case:where .Properties:Mixtures of Gaussians (1)Old Faithful data setSingle Gaussian Mixture of two GaussiansMixtures of Gaussians (2)Combine simple models into a complex model:ComponentMixing coefficientK=3Mixtures of Gaussians (3)Mixtures of Gaussians (4)Determining parameters ¹, §, and ¼ using maximum log likelihoodSolution: use standard, iterative, numeric optimization methods or the expectation maximization algorithm (Chapter 9). Log of a sum; no closed form maximum.The Exponential Family (1)where ´ is the natural parameter andso g(´) can be interpreted as a normalization coefficient.The Exponential Family (2.1)The Bernoulli DistributionComparing with the general form we see thatand soLogistic sigmoidThe Exponential Family (2.2)The Bernoulli distribution can hence be written asWhereReminder:The Exponential Family (3.1)The Multinomial Distributionwhere, , andNOTE: The ´kparameters are not independent since the corresponding ¹kmust satisfyThe Exponential Family (3.2)Let . This leads toand Here the ´kparameters are independent. Note thatandSoftmaxThe Exponential Family (3.3)The Multinomial distribution can then be written as whereThe Exponential Family (4)The Gaussian DistributionwhereML for the Exponential Family (1)From the definition of g(´) we getThusML for the Exponential Family (2)Give a data set, , the likelihood function is given by Thus we have Sufficient statisticConjugate priorsFor any member of the exponential family, there exists a priorCombining with the likelihood function, we getPrior corresponds to º pseudo-observations with value Â.Posterior of Gaussian mean
View Full Document