UB CSE 574 - Conditional Independence - D1853711

Home> Schools> University at Buffalo, The State University of New York> Computer Science & Engineering (CSE) > CSE 574> Conditional Independence

DOC PREVIEW

UB CSE 574 - Conditional Independence

School name University at Buffalo, The State University of New York

Course Cse 574- Introduction to Machine Learning

Pages 38

This preview shows page 1-2-3-18-19-36-37-38 out of 38 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning ! !! ! !Srihari 1 Conditional Independence Sargur Srihari [email protected] Learning ! !! ! !Srihari 2 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why is it important in Machine Learning? 3. Conditional independence from graphical models 4. Concept of “Explaining away” 5. “D-separation” property in directed graphs 6. Examples 1. Independent identically distributed samples in 1. Univariate parameter estimation 2. Bayesian polynomial regression 2. Naïve Bayes classifier 7. Directed graph as filter 8. Markov blanket of a nodeMachine Learning ! !! ! !Srihari 3 1. Conditional Independence • Consider three variables a,b,c • Conditional distribution of a given b and c, is p(a|b,c) • If p(a|b,c) does not depend on value of b – we can write p(a|b,c) = p(a|c) • We say that – a is conditionally independent of b given cMachine Learning ! !! ! !Srihari 4 Factorizing into Marginals • If conditional distribution of a given b and c does not depend on b – we can write p(a|b,c) = p(a|c) • Can be expressed in slightly different way – written in terms of joint distribution of a and b conditioned on c p(a,b|c) = p(a|b,c)p(b|c) using product rule = p(a|c)p(b|c) using earlier statement • That is joint distribution factorizes into product of marginals – Says that variables a and b are statistically independent given c • Shorthand notation for conditional independenceMachine Learning ! !! ! !Srihari a ε {A, ~A}, where A is Red b ε {B, ~B} where B is Blue c ε {C, ~C} where C is Green There are 90 different probabilities In this problem! Marginal Probabilities (6) p(a): P(A)=16/49 P(~A)=33/49 p(b): P(B)=18/49 P(~B)=31/49 p(c): P(C)=12/49 P(~C)=37/49 Joint (12 with 2 variables) p(a,b): P(A,B)=6/49 P(A,~B)=10/49 P(~A,B)=6/49) P(~A,~B)=21/49 p(b,c): P(B,C)=6/49 P(B,~C)=12/49 P(~B,C)=6/49 P(~B,~C)=25/49 p(a,c): P(A,C)=4/49 P(A,~C)=12/49 P(~A,C)=8/49 P(~A,~C)=25/49 Joint (8 with 3 variables) p(a,b,c): P(A,B,C)=2/49 P(A,B,~C)=4/49 P(A,~B,C)=2/49 P(A,~B,~C)=8/49 P(~A,B,C)=4/49 P(~A,B,~C)=8/49 P(~A,~B,C)=4/49 P(~A,~B,~C)=17/49 An example with Three Binary Variables 5 A C B p(a,b,c)=p(a)p(b,c/a)=p(a)p(b/a,c)p(c/a) Are there any conditional independences? Probabilities are assigned using a Venn diagram: allows us to evaluate every probability by inspection Shaded areas with respect to total area 7 x 7 squareMachine Learning ! !! ! !Srihari Single variables conditioned on single variables (16) (obtained from earlier values) p(a/b): P(A/B)=P(A,B)/P(B)=1/3 P(~A/B)=2/3 P(A/~B)=P(A,~B)/P(~B)=10/31 P(~A/~B)=21/31 P(A/C)=P(A,C)/P(C)=1/3 P(~A/C)=2/3 P(A/~C)=P(A,~C)/P(~C)=12/37 P(~A/~C)=25/37 P(B/C)=P(B,C)/P(C)=1/2 P(~B/C)=1/2 P(B/~C)=P(B,~C)/P(~C)=12/37 P(~B/~C)=25/37 P(B/A) P(~B/A) P(B/~A) P(~B/~A) P(C/A) P(~C/A) P(C/~A) P(~C/A) Three Binary Variables Example 6Machine Learning ! !! ! !Srihari Two variables conditioned on single variable(24) p(a,b/c): P(A,B/C)=P(A,B,C)/P(C)=1/6 P(A,B/~C)=P(A,B,~C)/P(~C)=4/37 P(A,~B/C)=P(A,~B,C)/P(C)=1/6 P(A,~B/~C)=P(A,~B,~C)/P(~C)=8/37 P(~A,B/C)=P(~A,B,C)/P(C)=1/3 P(~A,B/~C)=P(~A,B,~C)/P(~C)=8/37 P(~A,~B/C)=P(~A,~B,C)/P(C)=1/3 P(~A,~B/~C)=P(~A,~B,~C)/P(~C)=17/37 p(a,c/b) eight values p(b,c/a) eight values Similarly 24 values for one variables conditioned on two: p(a/b,c) etc There are no Independence Relationships P(A,B) ne P(A)P(B) P(A~,B) ne P(A)P(~B) P(~A,B) ne P(~A)P(B) P(A,B/C)=1/6=P(A/C)P(B/C) but P(A,B/~C)=4/37 ne P(A/~C)P(B/~C) Three Binary Variables: Conditional Independences 7 c a b p(a,b,c)=p(c)p(a,b/c)=p(c)p(a/b,c)p(b/c) If p(a,b/c)=p(a/c)p(b/c) then p(a/b,c)=p(a,b/c)/p(b/c)=p(a/c) Then the arrow from b to a would be eliminated If we knew the graph structure a priori then we could simplify some probability calculationsMachine Learning ! !! ! !Srihari 8 2. Importance of Conditional Independence • Important concept for probability distributions • Role in PR and ML – Simplifying structure of model – Computations needed to perform inference and learning • Role of graphical models – Testing for conditional independence from an expression of joint distribution is time consuming – Can be read directly from the graphical model • Using framework of d-separation (“directed”)Machine Learning ! !! ! !Srihari A Causal Bayesian Network 9 Age Gender Smoking Exposure To Toxics Cancer Lung Tumor Genetic Damage Serum Calcium Cancer is independent of Age and Gender given Exposure to toxics and SmokingMachine Learning ! !! ! !Srihari 10 3. Conditional Independence from Graphs • Three Example Graphs • Each has just three nodes • Together they illustrate concept of d (directed)-separation p(a,b,c)=p(a|c)p(b|c)p(c) p(a,b,c)=p(a)p(c|a)p(b|c) p(a,b,c)=p(a)p(b)p(c|a,b) Tail-Tail Node Head-Tail Node Head-Head Node Example 1 Example 2 Example 3Machine Learning ! !! ! !Srihari 11 First example without conditioning • Joint distribution is p(a,b,c)=p(a|c)p(b|c)p(c) • If none of the variables are observed, we can investigate whether a and b are independent – marginalizing both sides wrt c gives • In general this does not factorize to p(a)p(b), and so • Meaning(conditional(independence(property(does(not(hold(given(the(empty(set(φ"Note: May hold for a particular distribution due to specific numeral values. but does not follow in general from graph structureMachine Learning ! !! ! !Srihari 12 First example with conditioning • Condition(variable(c – 1.(By(product(rule(p(a,b,c)=p(a,b|c)p(c) – 2. According to graph p(a,b,c)= p(a|c)p(b|c)p(c) – 3.(Combining(the(two( p(a,b|c)=p(a|c)p(b|c) • So(we(obtain(the(conditional(independence(property(• Graphical(Interpretation(– Consider(path(from(node( a(to(node(b • Node(c(is(tail-to-tail'since(it(is(connected(to(the(tails(of(the(two(arrows(• Causes(these(nodes(to(be(dependent(• When(conditioned(on(node(c,((– blocks(path(from(a(to(b – causes(a(and(b(to(become(conditionally(independent(Tail-to-tail nodeMachine Learning !

View Full Document