UB CSE 574 - Bayesian Network Representation - D1296917

Home> Schools> University at Buffalo, The State University of New York> Computer Science & Engineering (CSE) > CSE 574> Bayesian Network Representation

DOC PREVIEW

UB CSE 574 - Bayesian Network Representation

School name University at Buffalo, The State University of New York

Course Cse 574- Introduction to Machine Learning

Pages 27

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning ! ! ! ! ! Srihari 1 Bayesian Network Representation Sargur Srihari [email protected] Learning ! ! ! ! ! Srihari Topics • Joint and Conditional Distributions • I-Maps – I-Map to Factorization – Factorization to I-Map – Perfect Map • Knowledge Engineering – Picking Variables – Picking Structure – Picking Probabilities – Sensitivity Analysis 2Machine Learning ! ! ! ! ! Srihari Joint Distribution • Company is trying to hire recent graduates • Goal is to hire intelligent employees – No way to test intelligence directly – But have access to Student’s SAT score • Which is informative but not fully indicative • Two random variables – Intelligence: Val(I)={i1,i0}, high and low – Score: Val(S)={s1,s0}, high and low • Joint distribution has 4 entries Need three parameters 3Machine Learning ! ! ! ! ! Srihari Alternative Representation: Conditional Parameterization • P(I, S) = P(I)P(S | I)!– Representation more compatible with causality • Intelligence influenced by Genetics, upbringing • Score influenced by Intelligence • Note: BNs are not required to follow causality but they often do • Need to specify P(I) and P(S|I)!– Three binomial distributions (3 parameters) needed • One marginal, two conditionals P(S|I=i0), P(S|i=i1)!4 IntelligenceSATGradeSATIntelligence(a) (b)Machine Learning ! ! ! ! ! Srihari Naïve Bayes Model • Conditional Parameterization combined with Conditional Independence assumptions – Val(G)={g1, g2, g3} represents grades A, B, C!• SAT and Grade are independent given Intelligence (assumption) • Knowing intelligence, SAT gives no information about class grade • Assertions • From probabilistic reasoning • From assumption – Combining 5 IntelligenceSATGradeSATIntelligence(a) (b)Three binomials, two 3-value multinomials: 7 params More compact than joint distribution P(G|I)Machine Learning ! ! ! ! ! Srihari BN for General Naiive Bayes Model 6 ClassX2X1Xn. . .P(C, X1,..Xn) = P(C) P(Xi| C)i=1n∏Encoded using a very small number of parameters Linear in the number of variablesMachine Learning ! ! ! ! ! Srihari Application of Naiive Bayes Model • Medical Diagnosis – Pathfinder expert system for lymph node disease (Heckerman et.al., 1992) • Full BN agreed with human expert 50/53 cases • Naiive Bayes agreed 47/53 cases 7Machine Learning ! ! ! ! ! Srihari Graphs and Distributions • Relating two concepts: – Independencies in distributions – Independencies in graphs • I-Map is a relationship between the two 8Machine Learning ! ! ! ! ! Srihari 9 Independencies in a Distribution • Let P be a distribution over X • I(P) is set of conditional independence assertions of the form (X⊥Y|Z) that hold in P X Y P(X,Y) x0 y0 0.08 x0 y1 0.32 x1 y0 0.12 x1 y1 0.48 X and Y are independent in P, e.g., P(x1)=0.48+0.12=0.6 P(y1)=0.32+0.48=0.8 P(x1,y1)=0.48=0.6x0.8 Thus (X⊥Y|ϕ)∈I(P)Machine Learning ! ! ! ! ! Srihari Independencies in a Graph • Local Conditional Independence Assertions (starting from leaf nodes): • Parents of a variable shield it from probabilistic influence • Once value of parents known, no influence of ancestors • Information about descendants can change beliefs about a node P(D, I,G,S, L) = P(D)P(I )P(G | D, I )P(S | I )P(L | G)• Graph with CPDs is equivalent to a set of independence assertions I (G) = {(L ⊥ I, D,S | G), (S ⊥ D,G, L | I ), (G ⊥ S | D, I ), (I ⊥ D| φ), (D ⊥ I, S |φ)}GradeLetterSATIntelligenceDifficultyd1d00.6 0.4i1i00.7 0.3i0i1s1s00.950.20.050.8g1g2g2l1l00.10.40.990.90.60.01i0,d0i0,d1i0,d0i0,d1g2g3g10.30.050.90.50.40.250.080.30.30.70.020.2L is conditionally independent of all other nodes given parent G S is conditionally independent of all other nodes given parent I Even given parents, G is NOT independent of descendant L Nodes with no parents are marginally independent D is independent of non-descendants I and SMachine Learning ! ! ! ! ! Srihari I-MAP • Let G be a graph associated with a set of independencies I(G) • Let P be a probability distribution with a set of independencies I(P) • Then G is an I-map of I if I(G)⊆I(P) • From direction of inclusion – distribution can have more independencies than the graph – Graph does not mislead in independencies existing in P 11Machine Learning ! ! ! ! ! Srihari Example of I-MAP X Y X Y P(X,Y) x0 y0 0.08 x0 y1 0.32 x1 y0 0.12 x1 y1 0.48 X Y P(X,Y) x0 y0 0.4 x0 y1 0.3 x1 y0 0.2 x1 y1 0.1 X Y X Y G0 encodes X⊥Y or I(G0)={X⊥Y} G1 encodes no Independence or I(G1)={Φ} G2 encodes no Independence I(G2)={Φ} X and Y are independent in P, e.g., G0 is an I-map of P G1 is an I-map of P G2 is an I-map of P X and Y are not independent in P Thus (X⊥Y) \∈ I(P) G0 is not an I-map of P G1 is an I-map of P G2 is an I-map of P If G is an I-map of P then it captures some of the independences, not allMachine Learning ! ! ! ! ! Srihari I-map to Factorization • A Bayesian network G encodes a set of conditional independence assumptions I(G)!• Every distribution P for which G is an I-map should satisfy these assumptions – Every element of I(G) should be in I(P)!• This is the key property to allowing a compact representation 13Machine Learning ! ! ! ! ! Srihari I-map to Factorization • From chain rule of probability P(I,D,G,L,S)=P(I)P(D|I)P(G|I,D)P(L|I,D,G)P(S|I,D,G,L) – Relies on no assumptions – Also not very helpful • Last factor requires evaluation of 24 conditional probabilities • Apply conditional independence assumptions induced

View Full Document