DOC PREVIEW
MIT 6 867 - Learning Bayesian networks

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine learning: lecture 24Tommi S. JaakkolaMIT [email protected]• Learning Bayes ian networks: complete data– estimating the parameters with fixed structure– learning the graph structure• Learning Bayes ian networks: incomplete data– EM and structural EM• ReviewTommi Jaakkola, MIT CSAIL 2Probabilities and conditional tables• Simple example: three discrete variables, each taking mpossible valuesP (x1) = θx1x3x1x2P (x2) = θx2P (x3|x1, x2) = θx3|x1,x2We assume that the model is fully parameterized in thesense that {θx1}, {θx2}, and {θx3|x1,x2for each distinctconfiguration of x1and x2} are unrestricted and can bechosen independently of each other.Tommi Jaakkola, MIT CSAIL 3Likelihood and complete data• When the observed data points are complete, the likelihoodhas a simple form:P (D|G, θ) =nYt=1P (xt1)P (xt2)P (xt3|xt1, xt2)=Yx1P (x1)N(x1)×Yx2P (x2)N(x2)×Yx1,x2Yx3P (x3|x1, x2)N(x1,x2,x3)Tommi Jaakkola, MIT CSAIL 4Likelihood and complete data• When the observed data points are complete, the likelihoodhas a simple form:P (D|G, θ) =nYt=1P (xt1)P (xt2)P (xt3|xt1, xt2)=Yx1θN(x1)x1×Yx2θN(x2)x2×Yx1,x2Yx3θN(x1,x2,x3)x3|x1,x2Each conditional table such as θx3|x1,x2for a fixe d x1and x2,can be estimated separately based on the observed countsN(x1, x2, x3).Tommi Jaakkola, MIT CSAIL 5ML parameter estimatesP (D|G, θ) =Yx1θN(x1)x1×Yx2θN(x2)x2×Yx1,x2Yx3θN(x1,x2,x3)x3|x1,x2• The maximum likelihood estimates of parameters θ·|x1,x2={θ1|x1,x2, . . . , θm|x1,x2} for each fixed configuration of x1andx2are simply normalized counts (cf. Markov models):ˆθx3|x1,x2=N(x1, x2, x3)N(x1, x2)where N(x1, x2) =Px3N(x1, x2, x3).Tommi Jaakkola, MIT CSAIL 6Bayesian estimates: the prior• We introduce independent priors, e.g., P3(θ·|x1,x2), acrossvariables and for e ach distinct configuration of the parentsP (θ|G) = P1(θ·) × P2(θ·) ×Yx1,x2P3(θ·|x1,x2)P (x1) = θx1x3x1x2P (x2) = θx2P (x3|x1, x2) = θx3|x1,x2Tommi Jaakkola, MIT CSAIL 7Bayesian estimates: the prior• We introduce independent priors, e.g., P3(θ·|x1,x2), acrossvariables and for each distinct configuration of the parents• Moreove r, we assume that these priors are Dirichlet:P3(θ·|x1,x2) =1Z0Yx3θN0(x1,x2,x3)−1x3|x1,x2with hyper-paramete rs (parameters of the prior)N0(x1, x2, x3) ≥ 0, interpreted as prior “counts”.Tommi Jaakkola, MIT CSAIL 8Bayesian estimates: the prior• We introduce independent priors, e.g., P3(θ·|x1,x2), acrossvariables and for each distinct configuration of the parents• Moreove r, we assume that these priors are Dirichlet:P3(θ·|x1,x2) =1Z0Yx3θN0(x1,x2,x3)−1x3|x1,x2with hyper-paramete rs (parameters of the prior)N0(x1, x2, x3) ≥ 0, interpreted as prior “counts”.This prior is concentrated aroundθ0x3|x1,x2=N0(x1, x2, x3)N0(x1, x2)where N0(x1, x2) =Px3N0(x1, x2, x3), and more so forlarger values of N0(x1, x2).Tommi Jaakkola, MIT CSAIL 9Bayesian estimates: the posterior• The posterior is also DirichletP3(θ·|x1,x2|D) ∝likelihoodz }| {Yx3θN(x1,x2,x3)x3|x1,x2·priorz }| {Yx3θN0(x1,x2,x3)−1x3|x1,x2=1ZYx3θN0(x1,x2,x3)+N(x1,x2,x3)−1x3|x1,x2with hyper-parameters N0(x1, x2, x3) + N(x1, x2, x3).Tommi Jaakkola, MIT CSAIL 10Outline• Learning Bayes ian networks: complete data– estimating the parameters with fixed structure– learning the graph structure• Learning Bayes ian networks: incomplete data– EM and structural EM• ReviewTommi Jaakkola, MIT CSAIL 11Bayesian score• We can use the marginallikelihood (Bayesian score)as a model (graph) selectioncriterion:P (x1) = θx1x3x1x2P (x2) = θx2P (x3|x1, x2) = θx3|x1,x2P (D|G) =ZP (D|G, θ)P (θ|G)dθTommi Jaakkola, MIT CSAIL 12Bayesian score• We can use the marginallikelihood (Bayesian score)as a model (graph) selectioncriterion:P (x1) = θx1x3x1x2P (x2) = θx2P (x3|x1, x2) = θx3|x1,x2P (D|G) =ZP (D|G, θ)P (θ|G)dθ• The form of the likelihood and the priorP (D|G, θ) =Yx1θN(x1)x1×Yx2θN(x2)x2×Yx1,x2Yx3θN(x1,x2,x3)x3|x1,x2P (θ|G) ∝Yx1θN0(x1)x1| {z }P1(θ·)×Yx2θN0(x2)x2| {z }P2(θ·)×Yx1,x2Yx3θN0(x1,x2,x3)x3|x1,x2| {z }P3(θ·|x1,x2)permit us to evaluate the Bayesian score locally.Tommi Jaakkola, MIT CSAIL 13Bayesian score: graphsP (x1) = θx1x3x1x2P (x2) = θx2P (x3|x1, x2) = θx3|x1,x2• The Bayesian score reduces to a product of local terms:P (D|G) =ZP (D|G, θ)P (θ|G)dθ=ZYx1θN(x1)x1×Yx2θN(x2)x2×Yx1,x2Yx3θN(x1,x2,x3)x3|x1,x2P (θ|G)dθ= P (D1|G) × P (D2|G) ×Yx1,x2P (D3|x1,x2|G)Tommi Jaakkola, MIT CSAIL 14Learning Bayesian networks• We can perform a greedy search over (equivalence classesof) Bayesian networks based on the score:x3x1x2x3x1x2x3x1x2x3x1x2x3x1x2x3x1x2x3x1x2x3x1x2x3x1x2x3x1x2x3x1x2Tommi Jaakkola, MIT CSAIL 15Review for the final• The final is comprehensive• Major concepts– regression, active learning– classification, margins, kernels, feature selection– over-fitting, regularization, generalization, model selection– latent variable models, estimation with incomplete data– clustering, objectives– graphs and probabilities, inferenceTommi Jaakkola, MIT CSAIL


View Full Document

MIT 6 867 - Learning Bayesian networks

Download Learning Bayesian networks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning Bayesian networks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning Bayesian networks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?