1CMSC828G Principles of Data Mining Lecture #15•Readings:– handouts•Today’s Lecture:– Inference–Learning BNs– slides courtesy of Nir Friedman• Upcoming Dates:– Midterm ThursdayInference in Bayesian Inference in Bayesian NetworksNetworksVariable EliminationGeneral idea:• Write query in the form• Iteratively– Move all irrelevant terms outside of innermost sum– Perform innermost sum, getting a new term– Insert the new term into the product∑∑∑∏=kxxxiiinpaxPXP32)|(),( LeA More Complex ExampleVisit to AsiaSmokingLung CancerTuberculosisAbnormalityin ChestBronchitisX-RayDyspnea•“Asia” network:VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,bInitial factorsVSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,bInitial factorsEliminate: vNote: fv(t) = P(t)In general, result of elimination is not necessarily a probability termCompute:∑=vvvtPvPtf)|()()(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒2VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: s,x,t,l,a,b•Initial factorsEliminate: sSumming on sresults in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variablesCompute:∑=ssslPsbPsPlbf)|()|()(),(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: x,t,l,a,b•Initial factorsEliminate: xNote: fx(a) = 1for all values of a !!Compute:∑=xxaxPaf)|()(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒),|(),|()(),()(badPltaPaflbftfxsv⇒VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: t,l,a,b•Initial factorsEliminate: tCompute:∑=tvtltaPtflaf),|()(),(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒),|(),|()(),()(badPltaPaflbftfxsv⇒),|(),()(),(badPlafaflbftxs⇒VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: l,a,b•Initial factorsEliminate: lCompute:∑=ltsllaflbfbaf),(),(),(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒),|(),|()(),()(badPltaPaflbftfxsv⇒),|(),()(),(badPlafaflbftxs⇒),|()(),(badPafbafxl⇒VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: b•Initial factorsEliminate: a,bCompute:∑∑==babaxladbfdfbadpafbafdbf),()(),|()(),(),(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒),|(),|()(),()(badPltaPaflbftfxsv⇒),|()(),(badPafbafxl⇒),|(),()(),(badPlafaflbftxs⇒)(),(dfdbfba⇒⇒Variable Elimination• We now understand variable elimination as a sequence of rewriting operations• Actual computation is done in elimination step• Exactly the same computation procedure applies to Markov networks• Computation depends on order of elimination3∑=xkxkxyyxfyyf),,,('),,(11KK∏==milikxiyyxfyyxf1,1,1,11),,(),,,(' KKComplexity of variable elimination• Suppose in one elimination step we computeThis requires • multiplications– For each value for x, y1, …, yk, we do mmultiplications• additions– For each value of y1, …, yk, we do |Val(X)| additionsComplexity is exponential in number of variables in the intermediate factor!∏⋅⋅iiYXm)Val()Val(∏⋅iiYX)Val()Val(Understanding Variable Elimination• We want to select “good” elimination orderings that reduce complexity• We start by attempting to understand variable elimination via the graph we are working with• This will reduce the problem of finding good ordering to graph-theoretic operation that is well-understoodUndirected graph representation• At each stage of the procedure, we have an algebraic term that we need to evaluate• In general this term is of the form:where Ziare sets of variables• We now plot a graph where there is undirected edge X--Y if X,Y are arguments of some factor– that is, if X,Y are in some Zi• Note: this is the Markov network that describes the probability on the variables we did not eliminate yet∑∑∏=1)(),,(1yyiiknfxxPiZKKChordal Graphs• elimination ordering ⇒ undirected chordal graphGraph:• Maximal cliques are factors in elimination• Factors in elimination are cliques in the graph• Complexity is exponential in size of the largest clique in graphLTA BXVSDVSLTA BX DInduced Width• The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination• This quantity is called the induced width of a graph according to the specified ordering• Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph General Networks•From graph theory:Thm:• Finding an ordering that minimizes the induced width is NP-HardHowever,• There are reasonable heuristic for finding “relatively” good ordering• There are provable approximations to the best induced width• If the graph has a small induced width, there are algorithms that find it in polynomial time4Elimination on Trees• Formally, for any tree, there is an elimination ordering with induced width = 1Thm• Inference on trees is linear in number of variablesPolyTrees• A polytree is a network where there is at most one path from one variable to anotherThm:• Inference in a polytree is linear in the representation size of the network– This assumes tabular CPT representationACBDEF GHApproaches to inference•Exact inference – Inference in Simple Chains– Variable elimination– Clustering / join tree algorithms• Approximate inference– Stochastic simulation / sampling methods– Markov chain Monte Carlo methods– Mean field theoryStochastic simulation• Suppose you are given values for some subset of the variables, G, and want to infer values for unknown variables, U• Randomly generate a very large number of instantiations from the BN– Generate instantiations for all variables – start at root variables and work your way “forward”• Only keep those instantiations that are consistent with the values for G• Use the frequency of values for U to get estimated probabilities• Accuracy of the results depends on the size of the sample (asymptotically approaches exact
View Full Document