DOC PREVIEW
MIT 6 867 - Lecture Notes

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine learning: lecture 12Tommi S. JaakkolaMIT AI [email protected]• Density estimation– Parametric, mixture models– Estimation via the EM algorithm– ExamplesTommi Jaakkola, MIT AI Lab 2Why density estimationThe digits again...• Poss ible uses:– understanding the generation process of examples– clustering– classification via class-conditional densities– inference based on incomplete observationsTommi Jaakkola, MIT AI Lab 3Parametric density models• Probability model = a parameterized family of probabilitydistributions• Example: a simple multivariate Gaussian modelp(x|µ, Σ) =1(2π)p/2|Σ|1/2exp{ −12(x − µ)TΣ−1(x − µ) }• This is a generative modelin the sense that we cangenerate x’sTommi Jaakkola, MIT AI Lab 4Sampling from a Gaussian• 1-dimensional Gaussian probability density function (pdf)p(x|µ, σ2) and the corresponding cumulative distributionfunction (cdf) Fµ,σ2(x)−4 −3 −2 −1 0 1 2 3 400.050.10.150.20.250.30.350.4pdf p(X=x) −4 −3 −2 −1 0 1 2 3 400.10.20.30.40.50.60.70.80.91cdf P(X≤ x) • To draw a sample from a Gaussian, we can invert thecumulative distribution functionFµ,σ2(x) =Zx−∞p(z|µ, σ2)dzu ∼ Uniform(0, 1) ⇒ x = F−1µ,σ2(u) ∼ p(x|µ, σ2)Tommi Jaakkola, MIT AI Lab 5Multi-variate Gaussian samples• A multivariate sample can be constructed from multipleindependent one dimensional Gaussian samples:zi∼ p(zi|µ = 0, σ2= 1), z = [z1, . . . , zd]Tx = Σ1/2z + µIn this case x ∼ p(x|µ, Σ).Tommi Jaakkola, MIT AI Lab 6Multi-variate density estimation• A mixture of Gaussians modelp(x|θ) =kXi=1pjp(x|µj, Σj)where θ = {p1, . . . , pk, µ1, . . . , µk, Σ1, . . . , Σk} contains allthe parameters of the mixture model. {pj} are known asmixing proportions or coefficients.Tommi Jaakkola, MIT AI Lab 7Mixture density• Data generation process:P(x|y=1) P(x|y=2)y=1 y=2 P(y) 2 1 p(x|θ) =Xj=1,2P (y = j) · p(x|y = j) (generic mixture)=Xj=1,2pj· p(x|µj, Σj) (mixture of Gaussians)• Any data point x could have been generated in two waysTommi Jaakkola, MIT AI Lab 8Mixture density• If we are given just x we don’t know which mixturecomponent this example came fromp(x|θ) =Xj=1,2pjp(x|µj, Σj)• We can evaluate the posterior probability that an observedx was generated from the first mixture componentP (y = 1|x, θ) =P (y = 1) · p(x|y = 1)Pj=1,2P (y = j) · p(x|y = j)=p1p(x|µ1, Σ1)Pj=1,2pjp(x|µj, Σj)• This solves a credit assignment problemTommi Jaakkola, MIT AI Lab 9Mixture density: posterior sampling• Consider sampling x from the mixture density, then y fromthe posterior over the components given x, and finally x0from the component density indicated by y:x ∼ p(x|θ)y ∼ P (y|x, θ)x0∼ p(x0|y, θ)Is y a fair sample from the prior distribution P (y)?Is x0a fair sample from the mixture density p(x0|θ)?Tommi Jaakkola, MIT AI Lab 10Mixture density estimation• Suppose we want to es timate a two c omponent mixture ofGaussians model.p(x|θ) = p1p(x|µ1, Σ1) + p2p(x|µ2, Σ2)• If each example xiin the training set we re labeled yi=1, 2 according to which mixture component (1 or 2) hadgenerated it, then the estimation would be easy.2 1 • Labeled examples ⇒ no credit assignment problemTommi Jaakkola, MIT AI Lab 11Mixture density estimationWhen examples are alreadyassigned to mixturecomponents (labeled), wecan estimate each Gaussianindependently2 1 • If ˆnjis the number of examples labeled j, then for eachj = 1, 2 we setˆpj←ˆnjnˆµj←1ˆnjXi:yi=jxiˆΣj←1ˆnjXi:yi=j(xi− ˆµj)(xi− ˆµj)TTommi Jaakkola, MIT AI Lab 12Mixture density estimation: credit assignment• Of course we don’t have such labels ... but we can guess whatthe labels might be based on our current mixture distribution• We get soft labels or posteriorprobabilities of which Gaussiangenerated which example:ˆp(j|i) ← P (yi= j|xi, θ)wherePj=1,2ˆp(j|i) = 1 for alli = 1, . . . , n.−0.5 0 0.5 1 1.5 2−0.500.511.52• When the Gaussians are almost identical (as in the figure),ˆp(1|i) ≈ ˆp(2|i) for almost any available point xi.Even slight differences can help us determine how we shouldmodify the Gaussians.Tommi Jaakkola, MIT AI Lab 13The EM algorithmE-step: softly assign examples to mixture componentsˆp(j|i) ← P (yi= j|xi, θ), for all j = 1, 2 and i = 1, . . . , nM-step: re-estimate the parameters (separately for the twoGaussians) based on the soft assignments.ˆnj←nXi=1ˆp(j|i) = Soft # of examples labeled jˆpj←ˆnjnˆµj←1ˆnjnXi=1ˆp(j|i) xiˆΣj←1ˆnjnXi=1ˆp(j|i) (xi− ˆµj)(xi− ˆµj)TTommi Jaakkola, MIT AI Lab 14Mixture density estimation: example−0.5 0 0.5 1 1.5 2−0.500.511.52−0.5 0 0.5 1 1.5 2−0.500.511.52−0.5 0 0.5 1 1.5 2−0.500.511.52−0.5 0 0.5 1 1.5 2−0.500.511.52Tommi Jaakkola, MIT AI Lab 15Mixture density estimation−0.5 0 0.5 1 1.5 2−0.500.511.52−0.5 0 0.5 1 1.5 2−0.500.511.52−0.5 0 0.5 1 1.5 2−0.500.511.52−0.5 0 0.5 1 1.5 2−0.500.511.52Tommi Jaakkola, MIT AI Lab 16Mixture density estimation−0.5 0 0.5 1 1.5 2−0.500.511.52−0.5 0 0.5 1 1.5 2−0.500.511.52−0.5 0 0.5 1 1.5 2−0.500.511.52−0.5 0 0.5 1 1.5 2−0.500.511.52Tommi Jaakkola, MIT AI Lab 17The EM-algorithm• Each iteration of the EM-algorithm monotonically increasesthe (log-)likelihood of the n training examples x1, . . . , xn:log p( data |θ) =nXi=1logp(xi|θ)z }| {p1p(xi|µ1, Σ1) + p2p(xi|µ2, Σ2)where θ = {p1, p2, µ1, µ2, Σ1, Σ2} contains all the parametersof the mixture model.0 5 10 15 20 25 30 35−500−400−300−200−1000100200Tommi Jaakkola, MIT AI Lab 18DemoTommi Jaakkola, MIT AI Lab 19Classification example• A digit recognition problem (8x8 binary digits)Training set n = 100 (50 examples of each digit).Test set n = 400 (200 examples of each digit).• We estimate a mixture of Gaussians model separately foreach type of digitClass 1: P (x|θ1), (e.g., a 3-component mixture density)Class 0: P (x|θ0), (e.g., a 3-component mixture density)• Assuming the examples in each class are equally likely apriori, we will classify new examples x according toClass = 1 if logP (x|ˆθ1)P (x|ˆθ0)> 0 and Class = 0 otherwiseTommi Jaakkola, MIT AI Lab 20Classification example cont’d• The figure gives the number of m issc lassified examples on thetest set as a function of the number of mixture componentsin each class-conditional model0 2 4 6 8 1026283032343638404244• Anything wrong with this figure?Tommi Jaakkola, MIT AI Lab


View Full Document

MIT 6 867 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?