Inference in Bayesian networksChapter 14.4–5Chapter 14.4–5 1Outline♦ Exact inference by enumeration♦ Exact inference by variable elimination♦ Approximate inference by stochastic simulation♦ Approximate inference by Markov chain Monte CarloChapter 14.4–5 2Inference tasksSimple queries: compute posterior marginal P(Xi|E = e)e.g., P (NoGas|Gauge = empty, Lights = on, Starts = false)Conjunctive queries: P(Xi,Xj|E = e)=P(Xi|E = e)P(Xj|Xi, E = e)Optimal decisions: decision networks include utility information;probabilistic inference required for P (outcome|action, evidence)Value of information: which evidence to seek next?Sensitivity analysis: which probability values are most critical?Explanation: why do I need a new starter motor?Chapter 14.4–5 3Inference by enumerationSlightly intelligent way to sum out variables from the joint without actuallyconstructing its explicit representationSimple query on the burglary network:BEJAMP(B|j, m)= P(B,j,m)/P (j, m)= αP(B,j, m)= αΣeΣaP(B, e, a, j, m)Rewrite full joint entries using product of CPT entries:P(B|j, m)= αΣeΣaP(B)P (e)P(a|B,e)P (j|a)P (m|a)= αP(B)ΣeP (e) ΣaP(a|B,e)P(j|a)P (m|a)Recursive depth-first enumeration: O(n) space, O(dn) timeChapter 14.4–5 4Enumeration algorithmfunction Enumeration-Ask(X, e,bn) returns a distribution over Xinputs: X, the query variablee, observed values for variables Ebn, a Bayesian network with variables {X}∪E ∪ YQ(X ) ← a distribution over X, initially emptyfor each value xiof X doextend e with value xifor XQ(xi) ← Enumerate-All(Va r s [bn], e)return Normalize(Q(X ))function Enumerate-All(vars, e) returns a real numberif Empty?(vars) then return 1.0Y ← First(vars)if Y has value y in ethen return P (y | Pa(Y )) × Enumerate-All(Rest(vars), e)else returnyP (y | Pa(Y )) × Enumerate-All(Rest(vars), ey)where eyis e extended with Y = yChapter 14.4–5 5Evaluation treeP(j|a).90P(m|a).70 .01P(m| a).05P(j| a)P(j|a).90P(m|a).70 .01P(m| a).05P(j| a)P(b).001P(e).002P( e).998P(a|b,e).95.06P( a|b, e).05P( a|b,e).94P(a|b, e)Enumeration is inefficient: repeated computatione.g., computes P (j|a)P (m|a) for each value of eChapter 14.4–5 6Inference by variable eliminationVariable elimination: carry out summations right-to-left,storing intermediate results (factors) to avoid recomputationP(B|j, m)= α P(B) BΣeP (e) EΣaP(a|B,e) AP (j|a) JP (m|a) M= αP(B)ΣeP (e)ΣaP(a|B,e)P(j|a)fM(a)= αP(B)ΣeP (e)ΣaP(a|B,e)fJ(a)fM(a)= αP(B)ΣeP (e)ΣafA(a, b, e)fJ(a)fM(a)= αP(B)ΣeP (e)f¯AJM(b, e) (sum out A)= αP(B)f¯E¯AJM(b) (sum out E)= αfB(b) × f¯E¯AJM(b)Chapter 14.4–5 7Variable elimination: Basic operationsSumming out a variable from a product of factors:move any constant factors outside the summationadd up submatrices in pointwise product of remaining factorsΣxf1×···×fk= f1×···×fiΣxfi+1×···×fk= f1×···×fi× f¯Xassuming f1,...,fido not depend on XPointwise product of factors f1and f2:f1(x1,...,xj,y1,...,yk) × f2(y1,...,yk,z1,...,zl)= f(x1,...,xj,y1,...,yk,z1,...,zl)E.g., f1(a, b) × f2(b, c)=f(a, b, c)Chapter 14.4–5 8Variable elimination algorithmfunction Elimination-Ask(X, e,bn) returns a distribution over Xinputs: X, the query variablee, evidence specified as an eventbn, a belief network specifying joint distribution P(X1,...,Xn)factors ← []; vars ← Reverse(Va r s [bn])for each var in vars dofactors ← [Make-Factor(var, e)|factors]if var is a hidden variable then factors ← Sum-Out(var, factors)return Normalize(Pointwise-Product(factors))Chapter 14.4–5 9Irrelevant variablesConsider the query P(JohnCalls|Burglary = true)BEJAMP (J|b)=αP(b)eP (e)aP (a|b, e)P (J|a)mP (m|a)Sum over m is identically 1; M is irrelevant to the queryThm 1: Y is irrelevant unless Y ∈ Ancestors({X}∪E)Here, X = JohnCalls, E = {Burglary},andAncestors({X}∪E)={Alarm, Earthquake}so MaryCalls is irrelevant(Compare this to backward chaining from the query in Horn clause KBs)Chapter 14.4–5 10Irrelevant variables contd.Defn: moral graph of Bayes net: marry all parents and drop arrowsDefn: A is m-separatedfrom B by C iff separated by C in the moral graphThm 2: Y is irrelevant if m-separated from X by EBEJAMFor P (JohnCalls|Alarm = true),bothBurglary and Earthquake are irrelevantChapter 14.4–5 11Complexity of exact inferenceSingly connected networks (or polytrees):– any two nodes are connected by at most one (undirected) path– time and space cost of variable elimination are O(dkn)Multiply connected networks:– can reduce 3SAT to exact inference ⇒ NP-hard– equivalent to counting 3SAT models ⇒ #P-completeA B C D1 2 3AND0.5 0.50.50.5LLLL1. A v B v C2. C v D v A3. B v C v DChapter 14.4–5 12Inference by stochastic simulationBasic idea:1) Draw N samples from a sampling distribution SCoin0.52) Compute an approximate posterior probabilityˆP3) Show this converges to the true probability POutline:– Sampling from an empty network– Rejection sampling: reject samples disagreeing with evidence– Likelihood weighting: use evidence to weight samples– Markov chain Monte Carlo (MCMC): sample from a stochastic processwhose stationary distribution is the true posteriorChapter 14.4–5 13Sampling from an empty networkfunction Prior-Sample(bn) returns an event sampled from bninputs: bn, a belief network specifying joint distribution P(X1,...,Xn)x ← an event with n elementsfor i =1to n doxi← a random sample from P(Xi| parents(Xi))given the values of Parents(Xi) in xreturn xChapter 14.4–5 14ExampleCloudyRainSprinkler WetGrassCTF.80.20P(R|C)CTF.10.50P(S|C)SRTTTFFTFF.90.90.99P(W|S,R)P(C).50.01Chapter 14.4–5 15ExampleCloudyRainSprinkler WetGrassCTF.80.20P(R|C)CTF.10.50P(S|C)SRTTTFFTFF.90.90.99P(W|S,R)P(C).50.01Chapter 14.4–5 16ExampleCloudyRainSprinkler WetGrassCTF.80.20P(R|C)CTF.10.50P(S|C)SRTTTFFTFF.90.90.99P(W|S,R)P(C).50.01Chapter 14.4–5 17ExampleCloudyRainSprinkler WetGrassCTF.80.20P(R|C)CTF.10.50P(S|C)SRTTTFFTFF.90.90.99P(W|S,R)P(C).50.01Chapter 14.4–5 18ExampleCloudyRainSprinkler WetGrassCTF.80.20P(R|C)CTF.10.50P(S|C)SRTTTFFTFF.90.90.99P(W|S,R)P(C).50.01Chapter 14.4–5 19ExampleCloudyRainSprinkler WetGrassCTF.80.20P(R|C)CTF.10.50P(S|C)SRTTTFFTFF.90.90.99P(W|S,R)P(C).50.01Chapter 14.4–5 20ExampleCloudyRainSprinkler WetGrassCTF.80.20P(R|C)CTF.10.50P(S|C)SRTTTFFTFF.90.90.99P(W|S,R)P(C).50.01Chapter 14.4–5 21Sampling from an empty network
View Full Document