Inference in Bayesian NetworksJoseph E. Gonzalez10701-Recitation© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Convenient Probabilistic ModelCompact RepresentationResistant to Over-fittingCaptures underlying structureCreates a “language” for describing probability relationshipsPermits the construction of generic inference techniquesVariable Elimination, Variational Methods, Belief Propagation 2© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Joint ProbabilityWhat can you compute with P(X1,X2,...,Xn)?Joint probabilityMarginal ProbabilityConditional distributions: P(X1 | X2, X3) (Predictions)Most likely explanations: max P(X1, X4 | X2, X3)Samples from the distributionInformation gain …3© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Directed Acyclic GraphConditional Probability Distributions / Tables / FunctionsWhat is a Bayesian Network?4ABCDEFGP(A) = f1(A)P(B) = f2(B)P(C|A) = f3(C,A)P(D|B) = f4(D,B)P(E|C,D) = f5(E,C,D)P(F|D) = f6(F,D)P(G|E,F) = f7(G,E,F)© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Conditional Probability … CPDs / CPTs Conditional ProbabilityCPD: DistributionsCPT: TablesCan be:DiscreteContinuous5P(C|A)A=TA=FC=TC=F0.30.60.70.4P (C|A) =1√2πe„−(C−A)22σ2«OR© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Graphs Describe Probability Distributions6ABCDEFGP(A,B,C,D,E,F,G) =P(A) P(B) P(C|A) P(D|B) P(E|C,D) P(F|D)P(G|E,F)© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University For a directed acyclic graph G=(V,E)Bayesian NetworkWriting the Joint ProbabilityWe say that P(X1,…,Xn) factors with respect to the graph G7P (X1, X2, . . . , Xn) =n!i=1P (Xi|P aG[Xi])Where PaG[Xi] are the parents of Xi© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Conditional IndependenceIf influence can flow from node A to node B (given observations) then nodes A and B are not (conditionally) independent.8ObservedUnobservedOpenClosed© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University IndependenceA Indep BYesA Indep GNoA Indep B given GNo9ABCDEFG© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University InferenceComputing Joint Assignment P(A=a, B=b,…, B=b) EasyComputing MarginalTricky to HardComputing ConditionalRequires Marginal10ABCDEFG© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University You want P(X1,X2) but you have P(X1,…,Xn)Marginalize Unobserved VariablesWhy is it hard to Marginalizefor X3 = 1:b for X4 = 1:bfor X5 = 1:bfor X6 = 1:b ...11P (X1, X2) =!x3!x4. . .!xnP (X1, X2, x3, x4, . . . , xn)© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Want to computeSimple Algebra ExampleMuch EasierRearrange the sumsHow can we save some work12!a!b!c!df1(a)f2(b)f3(c)f4(d)!"af1(a)#!"bf2(b)#!"cf3(c)#!"df4(d)#© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Suppose we want to computeCoupled VariablesFunctions rather than constantsReturn functions13!a!b!c!df1(a)f2(b, a)f3(c, b)f4(d, c)g1(c)!a!b!cf1(a)f2(b, a)f3(c, b)!df4(d, c)Cache Computation© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Assume variables are discreteWhat is a Function14Assume variables are discreteEach entry in the table is a cached “marginalization”g(x1, x2, . . . , xn) = table[x1, x2, . . . , xn]© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Example :: GraphDefine Some Variables(H) HW Grade = {P, F}(E) Exam Grade = {P,F}(P) Project Grade = {P,F}(O) Overall Grade = {P,F}(C) Class Attend = {T,F}( :-) ) Happy = {T,F}15:-)EOCHP© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Example :: Joint Probability16:-)EOCHPP(C, H, E, O, P, :-)) =P(C)P(P )P(H | C)P(E | C)P(O | H, E, P )P(:-) | H, O)© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Example :: Tables17:-)EOCHPP(H|C)H=pH=fC=tC=f0.70.30.40.6P(O|H,E,P)O=pO=fH=pH=pH=pH=pH=fH=fH=fH=fE=pP=p0.950.05E=pP=f0.60.4E=fP=p0.60.4E=fP=f0.30.7E=pP=p0.30.7E=pP=f0.20.8E=fP=p0.10.9E=fP=f0.010.99P(E|C)E=pE=fC=tC=f0.70.30.40.6P(:-)|O,H):-)=t:-)=fO=pO=pO=fO=fH=p0.70.3H=f0.80.2H=p0.20.8H=f0.10.9P(C)C=tC=f0.80.2P(P)P=pP=f0.70.3© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Happiness and Class Attendance18:-)EOCHPP(:-), C) =!H!O!P!EP(C, H, E, O, P, :-))=!H,O,P,EP r(C)P(P )P(H | C)P(E | C)P(O | H, E, P )P(:-) | H, O)=!H,O,EP r(C)P(H | C)P(E | C)P(:-) | H, O)"!PP(P )P(O | H, E, P )#This becomes a “function” table g1(O,H,E).© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University The Function g119g1(O, H, E) =!PP(P )P(O | H, E, P )P(O|H,E,P)O=pO=fH=pH=pH=pH=pH=fH=fH=fH=fE=pP=p0.950.05E=pP=f0.60.4E=fP=p0.60.4E=fP=f0.30.7E=pP=p0.30.7E=pP=f0.20.8E=fP=p0.10.9E=fP=f0.010.99P(P)P=pP=f0.70.3g1(O,H,E)O=pO=pO=pO=pO=fO=fO=fO=fH=pE=p0.7(0.95)+0.3(0.6)=0.845H=pE=f0.7(0.6)+0.3(0.3)=0.510H=fE=p0.7(0.3)+0.3(0.2)=0.270H=fE=f0.7(0.1)+0.3(0.01)=0.073H=pE=p0.7(0.05)+0.3(0.4)=0.155H=pE=f0.7(0.4)+0.3(0.7)=0.490H=fE=p0.7(0.7)+0.3(0.8)=0.730H=fE=f0.7(0.9)+0.3(0.99)=0.927© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Happiness and Class Attendance20:-)EOCHPP(:-), C) =!H,O,EP r(C)P(H | C)P(E | C)P(:-) | H, O)g1(O, H, E)=!H,OP r(C)P(H | C)P(:-) | H, O)"!EP(E | C)g1(O, H, E)#g2(O, H, C) =!EP(E | C)g1(O, H, E)Another table to compute© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University The Function g221g1(O,H,E)O=pO=pO=pO=pO=fO=fO=fO=fH=pE=p0.845H=pE=f0.510H=fE=p0.270H=fE=f0.073H=pE=p0.155H=pE=f0.490H=fE=p0.730H=fE=f0.927P(E|C)E=pE=fC=tC=f0.70.30.40.6g2(O,H,C)O=pO=pO=pO=pO=fO=fO=fO=fH=pC=t0.7(0.845)+0.3(0.510)=0.7445H=pC=f0.4(0.845)+0.6(0.510)=0.644H=fC=t0.7(0.270)+0.3(0.073)=0.2109H=fC=f0.4(0.270)+0.6(0.073)=0.1518H=pC=t0.7(0.155)+0.3(0.490)=0.2555H=pC=f0.4(0.155)+0.6(0.490)=0.356H=fC=t0.7(0.730)+0.3(0.927)=0.7891H=fC=f0.4(0.730)+0.6(0.927)=0.8482g2(O, H, C) =!EP(E | C)g1(O, H, E)© 2007 Joseph E. Gonzalez | Machine Learning Department | Carnegie Mellon University Happiness and Class Attendance22:-)EOCHPAnother table to computeP(:-), C) =!H,OP r(C)P(H | C)P(:-) | H, O)g2(O, H, C)=!HP r(C)P(H
View Full Document