DOC PREVIEW
UB CSE 574 - Bayesian Network Representation

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning ! ! ! ! ! Srihari 1 Bayesian Network Representation Sargur Srihari [email protected] Learning ! ! ! ! ! Srihari Topics • Joint and Conditional Distributions • I-Maps – I-Map to Factorization – Factorization to I-Map – Perfect Map • Knowledge Engineering – Picking Variables – Picking Structure – Picking Probabilities – Sensitivity Analysis 2Machine Learning ! ! ! ! ! Srihari Joint Distribution • Company is trying to hire recent graduates • Goal is to hire intelligent employees – No way to test intelligence directly – But have access to Student’s SAT score • Which is informative but not fully indicative • Two random variables – Intelligence: Val(I)={i1,i0}, high and low – Score: Val(S)={s1,s0}, high and low • Joint distribution has 4 entries Need three parameters 3Machine Learning ! ! ! ! ! Srihari Alternative Representation: Conditional Parameterization • P(I, S) = P(I)P(S | I)!– Representation more compatible with causality • Intelligence influenced by Genetics, upbringing • Score influenced by Intelligence • Note: BNs are not required to follow causality but they often do • Need to specify P(I) and P(S|I)!– Three binomial distributions (3 parameters) needed • One marginal, two conditionals P(S|I=i0), P(S|i=i1)!4 IntelligenceSATGradeSATIntelligence(a) (b)Machine Learning ! ! ! ! ! Srihari Naïve Bayes Model • Conditional Parameterization combined with Conditional Independence assumptions – Val(G)={g1, g2, g3} represents grades A, B, C!• SAT and Grade are independent given Intelligence (assumption) • Knowing intelligence, SAT gives no information about class grade • Assertions • From probabilistic reasoning • From assumption – Combining 5 IntelligenceSATGradeSATIntelligence(a) (b)Three binomials, two 3-value multinomials: 7 params More compact than joint distribution P(G|I)Machine Learning ! ! ! ! ! Srihari BN for General Naiive Bayes Model 6 ClassX2X1Xn. . .P(C, X1,..Xn) = P(C) P(Xi| C)i=1n∏Encoded using a very small number of parameters Linear in the number of variablesMachine Learning ! ! ! ! ! Srihari Application of Naiive Bayes Model • Medical Diagnosis – Pathfinder expert system for lymph node disease (Heckerman et.al., 1992) • Full BN agreed with human expert 50/53 cases • Naiive Bayes agreed 47/53 cases 7Machine Learning ! ! ! ! ! Srihari Graphs and Distributions • Relating two concepts: – Independencies in distributions – Independencies in graphs • I-Map is a relationship between the two 8Machine Learning ! ! ! ! ! Srihari 9 Independencies in a Distribution • Let P be a distribution over X • I(P) is set of conditional independence assertions of the form (X⊥Y|Z) that hold in P X Y P(X,Y) x0 y0 0.08 x0 y1 0.32 x1 y0 0.12 x1 y1 0.48 X and Y are independent in P, e.g., P(x1)=0.48+0.12=0.6 P(y1)=0.32+0.48=0.8 P(x1,y1)=0.48=0.6x0.8 Thus (X⊥Y|ϕ)∈I(P)Machine Learning ! ! ! ! ! Srihari Independencies in a Graph • Local Conditional Independence Assertions (starting from leaf nodes): • Parents of a variable shield it from probabilistic influence • Once value of parents known, no influence of ancestors • Information about descendants can change beliefs about a node P(D, I,G,S, L) = P(D)P(I )P(G | D, I )P(S | I )P(L | G)• Graph with CPDs is equivalent to a set of independence assertions I (G) = {(L ⊥ I, D,S | G), (S ⊥ D,G, L | I ), (G ⊥ S | D, I ), (I ⊥ D| φ), (D ⊥ I, S |φ)}GradeLetterSATIntelligenceDifficultyd1d00.6 0.4i1i00.7 0.3i0i1s1s00.950.20.050.8g1g2g2l1l00.10.40.990.90.60.01i0,d0i0,d1i0,d0i0,d1g2g3g10.30.050.90.50.40.250.080.30.30.70.020.2L is conditionally independent of all other nodes given parent G S is conditionally independent of all other nodes given parent I Even given parents, G is NOT independent of descendant L Nodes with no parents are marginally independent D is independent of non-descendants I and SMachine Learning ! ! ! ! ! Srihari I-MAP • Let G be a graph associated with a set of independencies I(G) • Let P be a probability distribution with a set of independencies I(P) • Then G is an I-map of I if I(G)⊆I(P) • From direction of inclusion – distribution can have more independencies than the graph – Graph does not mislead in independencies existing in P 11Machine Learning ! ! ! ! ! Srihari Example of I-MAP X Y X Y P(X,Y) x0 y0 0.08 x0 y1 0.32 x1 y0 0.12 x1 y1 0.48 X Y P(X,Y) x0 y0 0.4 x0 y1 0.3 x1 y0 0.2 x1 y1 0.1 X Y X Y G0 encodes X⊥Y or I(G0)={X⊥Y} G1 encodes no Independence or I(G1)={Φ} G2 encodes no Independence I(G2)={Φ} X and Y are independent in P, e.g., G0 is an I-map of P G1 is an I-map of P G2 is an I-map of P X and Y are not independent in P Thus (X⊥Y) \∈ I(P) G0 is not an I-map of P G1 is an I-map of P G2 is an I-map of P If G is an I-map of P then it captures some of the independences, not allMachine Learning ! ! ! ! ! Srihari I-map to Factorization • A Bayesian network G encodes a set of conditional independence assumptions I(G)!• Every distribution P for which G is an I-map should satisfy these assumptions – Every element of I(G) should be in I(P)!• This is the key property to allowing a compact representation 13Machine Learning ! ! ! ! ! Srihari I-map to Factorization • From chain rule of probability P(I,D,G,L,S)=P(I)P(D|I)P(G|I,D)P(L|I,D,G)P(S|I,D,G,L) – Relies on no assumptions – Also not very helpful • Last factor requires evaluation of 24 conditional probabilities • Apply conditional independence assumptions induced


View Full Document

UB CSE 574 - Bayesian Network Representation

Download Bayesian Network Representation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Network Representation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Network Representation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?