CALTECH CS 155 - Probabilistic Graphical Models - D1945199

Home> Schools> California Institute of Technology> Computer Science (CS) > CS 155> Probabilistic Graphical Models

CALTECH CS 155 - Probabilistic Graphical Models

School name California Institute of Technology

Course Cs 155- Probabilistic Graphical Models

Pages 5

Download Save

Unformatted text preview:

Tree Augmented Naive Bayes [40 points]Score equivalence [20 points]The Gaussian Distribution [30 points]MAP versus MPE [10 points]ReferencesCS/CNS/EE 155: Probabilistic Graphical ModelsProblem Set 2Handed out: 21 Oct 2009Due: 4 Nov 20091 Tree Augmented Naive Bayes [40 p oints]In this problem, you should hand in a printout of your MATLAB implementation. Also emaila zip archive with the source code to the TAs. The training data set is given in a file calledtrainingData.txt, available on the course webpage. There are 200 training examples. Eachrow of the data in the file is a training example. Given a sample, the 1st column is the classvariable C , and the 2nd to the 6th columns are the attributes A1, A2, A3, A4, A5. The testingdata set is given in a file called testingData.txt. There are 100 testing samples, with thesame format for each sample.1. Learning a Naive Bayes model. You are asked to learn a naive Bayesian networkbased on a given training data set. The structure of the naive Bayes Network is givenas follows:Figure 1: Naive Bayes network.Estimate the parameters for the conditional probability distributions in the networkusing MLE on the training data. Based on the constructed naive Bayesian network youcan classify samples by applying Bayes rule to compute conditional class probabilitiesP (C|A1, A2, A3, A4, A5), and predicting the label with the highest probability.Please write down the parameters θCand θA1|C, and the percentage of classificationerror on the testing data set.2. Learning a Tree Augmented Naive Bayes (TAN) model. Tree augmented naiveBayes models are formed by adding directional edges between attributes. After removingthe class variable, the attributes should form a tree structure (no V-structures). See Fig.2 as an example.Use the following procedure to learn the tree augmented naive Bayes model for thetraining data, then draw the structure of the obtained model.1Figure 2: An example of a tree augmented naive Bayes network.(a) Compute IbPD(Ai; Aj|C) between each pair of attributes, i 6= j, where IbPD(Ai; Aj|C)is the conditional mutual information (with respect to the empirical distributionbPDon the training data) between Ai, Ajgiven the class variable.IbPD(X; Y |C) =Xx,y,cbPD(x, y, c) logbPD(x, y|c)bPD(x|c)bPD(y|c)(b) Build a complete undirected graph in which the vertices are the attributes A1, A2, A3, A4, A5.Annotate the weight of an edge connecting Aiand Ajby IbPD(Ai; Aj|C).(c) Build a maximum weighted spanning tree.(d) Transform the resulting undirected tree to a directed one by choosing a root variableand setting the direction of all edges to be outward from it.(e) Construct a tree augmented naive Bayes model by adding a vertex labeled by Cand adding an directional edge from C to each Ai.3. TAN for classification. Based on the structure above, you are asked to estimate theparameters for conditional probability distributions using the training data set (usingMLE). Then you can classify the testing data set by computing the highest probabilityP (C|A1, A2, A3, A4, A5).What is the percentage of classification error for the testing data set? Please comparethis result with that using naive Bayesian structure, and explain why it performs betteror worse.4. Inference. Based on the TAN model constructed above (the network structure, θCandθAi|P aiwith 1 ≤ i ≤ 5), answer the following questions:(a) If A1= 1, A2= 0, A3= 0 are observed, what is the most likely assignment for(C, A4, A5)?(b) If A1= 0, A2= 0 are observed, what is the most likely assignment for A5? In thiscase, what’s the probability for this most likely assignment?22 Score equivalence [20 points]1. K2 prior. Show that the Bayesian score with a K2 prior in which we have a Dirichletprior Dirichlet(1, 1, . . . , 1) for each set of multinomial parameters is not score-equivalent.Hint: Construct a data set for which the score of the network X → Y differs from thescore of the network X ← Y .2. BDe score equivalence. Assume that we have a BDe prior specified by an equivalentsample size α and prior distribution P0. Prove the following:(a) Consider networks over the variables X and Y . Show that the BDe score of X → Yis equal to that of X ← Y .(b) Show that if G and G0are identical except for a covered edge reversal of X → Y ,then the BDe score of both networks is equal.(c) Show that the proof of score equivalence follows from the result in Part 2b andTheorem 3.9 of [KF09].(d) Given the above results, what are the implications for learning optimal trees?3 The Gaussian Distribution [30 points]Preamble. The multivariate Gaussian (or Normal) distribution over the D dimensionalcontinuous random vector x = [x1, x2, . . . , xD] has a joint probability distribution given byN (x|µ, Σ) = (2π)−D/2|Σ|−1/2exp−12(x − µ)>Σ−1(x − µ)where µ is the mean vector of length D, and Σ is the (symmetric and positive definite)covariance matrix, of size D × D. We sometimes write x ∼ N (µ, Σ) as a shorthand for theabove.Additionally, it is sometimes preferable to work with the precision matrix, which is just theinverse of the covariance, Λ = Σ−1. In this case, we use the notation x ∼ N (µ, Λ−1).We also point out the matrix inversion lemma(Z + U W V>)−1= Z−1− Z−1U(W−1+ V>Z−1U)−1V>Z−1which will be useful below in rewriting the joint distribution as a factored distribution.Let x and y be two jointly Gaussian random vectorsxy∼ p(x, y) = N (µ, Σ) = Nµxµy,A CC>B= N µxµy,˜A˜C˜C>˜B−1!.1. Show that the marginal distribution of x is N (µx, A), i.e.,x ∼ N (µx, A),3i.e.,Zp(x, y)dy =ZNµxµy,A CC>Bdy = N (µx, A)2. (a) Show that the conditional distribution of x given y isx|y ∼ N (µx+ CB−1(y − µy), A − CB−1C>).(b) Equivalently, show that x|y in terms of the precision matrixΛ =˜A˜C˜C>˜Bisx|y ∼ N (µx−˜A−1˜C(y − µy),˜A−1).3. Conjugate prior of multivariate Gaussian with known covariancePreamble. For a multivariate Gaussian N (x|µ, Λ) in which both the mean µ and theprecision Λ are unknown, the conjugate prior is the Gaussian-Wishart distribution:p(µ, Λ|µ0, β, W, ν) = N (µ|µ0, (βΛ)−1)W(Λ|W, ν)where the Wishart distribution is given byW(Λ|W, ν) = B(W, ν)|Λ|(ν−D−1)/2exp−12Tr(W−1Λ)andB(W, ν) = |W|−ν/2 2νD/2πD(D−1)/4DYi=1Γν + 1 − i2!−1and W is a D × D symmetric, positive definite matrix (see the appendix of

View Full Document


School:
Email:
New Password:
Confirm Password:

CALTECH CS 155 - Probabilistic Graphical Models

Sign up for free to view:

Please select your school