Machine Learning ! ! ! ! ! Srihari 1 Bayesian Network Representation Sargur Srihari [email protected] Learning ! ! ! ! ! Srihari Topics • Joint and Conditional Distributions • I-Maps – I-Map to Factorization – Factorization to I-Map – Perfect Map • Knowledge Engineering – Picking Variables – Picking Structure – Picking Probabilities – Sensitivity Analysis 2Machine Learning ! ! ! ! ! Srihari Joint Distribution • Company is trying to hire recent graduates • Goal is to hire intelligent employees – No way to test intelligence directly – But have access to Student’s SAT score • Which is informative but not fully indicative • Two random variables – Intelligence: Val(I)={i1,i0}, high and low – Score: Val(S)={s1,s0}, high and low • Joint distribution has 4 entries Need three parameters 3Machine Learning ! ! ! ! ! Srihari Alternative Representation: Conditional Parameterization • P(I, S) = P(I)P(S | I)!– Representation more compatible with causality • Intelligence influenced by Genetics, upbringing • Score influenced by Intelligence • Note: BNs are not required to follow causality but they often do • Need to specify P(I) and P(S|I)!– Three binomial distributions (3 parameters) needed • One marginal, two conditionals P(S|I=i0), P(S|i=i1)!4 IntelligenceSATGradeSATIntelligence(a) (b)Machine Learning ! ! ! ! ! Srihari Naïve Bayes Model • Conditional Parameterization combined with Conditional Independence assumptions – Val(G)={g1, g2, g3} represents grades A, B, C!• SAT and Grade are independent given Intelligence (assumption) • Knowing intelligence, SAT gives no information about class grade • Assertions • From probabilistic reasoning • From assumption – Combining 5 IntelligenceSATGradeSATIntelligence(a) (b)Three binomials, two 3-value multinomials: 7 params More compact than joint distribution P(G|I)Machine Learning ! ! ! ! ! Srihari BN for General Naiive Bayes Model 6 ClassX2X1Xn. . .P(C, X1,..Xn) = P(C) P(Xi| C)i=1n∏Encoded using a very small number of parameters Linear in the number of variablesMachine Learning ! ! ! ! ! Srihari Application of Naiive Bayes Model • Medical Diagnosis – Pathfinder expert system for lymph node disease (Heckerman et.al., 1992) • Full BN agreed with human expert 50/53 cases • Naiive Bayes agreed 47/53 cases 7Machine Learning ! ! ! ! ! Srihari Graphs and Distributions • Relating two concepts: – Independencies in distributions – Independencies in graphs • I-Map is a relationship between the two 8Machine Learning ! ! ! ! ! Srihari 9 Independencies in a Distribution • Let P be a distribution over X • I(P) is set of conditional independence assertions of the form (X⊥Y|Z) that hold in P X Y P(X,Y) x0 y0 0.08 x0 y1 0.32 x1 y0 0.12 x1 y1 0.48 X and Y are independent in P, e.g., P(x1)=0.48+0.12=0.6 P(y1)=0.32+0.48=0.8 P(x1,y1)=0.48=0.6x0.8 Thus (X⊥Y|ϕ)∈I(P)Machine Learning ! ! ! ! ! Srihari Independencies in a Graph • Local Conditional Independence Assertions (starting from leaf nodes): • Parents of a variable shield it from probabilistic influence • Once value of parents known, no influence of ancestors • Information about descendants can change beliefs about a node P(D, I,G,S, L) = P(D)P(I )P(G | D, I )P(S | I )P(L | G)• Graph with CPDs is equivalent to a set of independence assertions I (G) = {(L ⊥ I, D,S | G), (S ⊥ D,G, L | I ), (G ⊥ S | D, I ), (I ⊥ D| φ), (D ⊥ I, S |φ)}GradeLetterSATIntelligenceDifficultyd1d00.6 0.4i1i00.7 0.3i0i1s1s00.950.20.050.8g1g2g2l1l00.10.40.990.90.60.01i0,d0i0,d1i0,d0i0,d1g2g3g10.30.050.90.50.40.250.080.30.30.70.020.2L is conditionally independent of all other nodes given parent G S is conditionally independent of all other nodes given parent I Even given parents, G is NOT independent of descendant L Nodes with no parents are marginally independent D is independent of non-descendants I and SMachine Learning ! ! ! ! ! Srihari I-MAP • Let G be a graph associated with a set of independencies I(G) • Let P be a probability distribution with a set of independencies I(P) • Then G is an I-map of I if I(G)⊆I(P) • From direction of inclusion – distribution can have more independencies than the graph – Graph does not mislead in independencies existing in P 11Machine Learning ! ! ! ! ! Srihari Example of I-MAP X Y X Y P(X,Y) x0 y0 0.08 x0 y1 0.32 x1 y0 0.12 x1 y1 0.48 X Y P(X,Y) x0 y0 0.4 x0 y1 0.3 x1 y0 0.2 x1 y1 0.1 X Y X Y G0 encodes X⊥Y or I(G0)={X⊥Y} G1 encodes no Independence or I(G1)={Φ} G2 encodes no Independence I(G2)={Φ} X and Y are independent in P, e.g., G0 is an I-map of P G1 is an I-map of P G2 is an I-map of P X and Y are not independent in P Thus (X⊥Y) \∈ I(P) G0 is not an I-map of P G1 is an I-map of P G2 is an I-map of P If G is an I-map of P then it captures some of the independences, not allMachine Learning ! ! ! ! ! Srihari I-map to Factorization • A Bayesian network G encodes a set of conditional independence assumptions I(G)!• Every distribution P for which G is an I-map should satisfy these assumptions – Every element of I(G) should be in I(P)!• This is the key property to allowing a compact representation 13Machine Learning ! ! ! ! ! Srihari I-map to Factorization • From chain rule of probability P(I,D,G,L,S)=P(I)P(D|I)P(G|I,D)P(L|I,D,G)P(S|I,D,G,L) – Relies on no assumptions – Also not very helpful • Last factor requires evaluation of 24 conditional probabilities • Apply conditional independence assumptions induced
View Full Document