This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CMSC828G Principles of Data Mining Lecture #15•Readings:– handouts•Today’s Lecture:– Inference–Learning BNs– slides courtesy of Nir Friedman• Upcoming Dates:– Midterm ThursdayInference in Bayesian Inference in Bayesian NetworksNetworksVariable EliminationGeneral idea:• Write query in the form• Iteratively– Move all irrelevant terms outside of innermost sum– Perform innermost sum, getting a new term– Insert the new term into the product∑∑∑∏=kxxxiiinpaxPXP32)|(),( LeA More Complex ExampleVisit to AsiaSmokingLung CancerTuberculosisAbnormalityin ChestBronchitisX-RayDyspnea•“Asia” network:VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,bInitial factorsVSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,bInitial factorsEliminate: vNote: fv(t) = P(t)In general, result of elimination is not necessarily a probability termCompute:∑=vvvtPvPtf)|()()(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒2VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: s,x,t,l,a,b•Initial factorsEliminate: sSumming on sresults in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variablesCompute:∑=ssslPsbPsPlbf)|()|()(),(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: x,t,l,a,b•Initial factorsEliminate: xNote: fx(a) = 1for all values of a !!Compute:∑=xxaxPaf)|()(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒),|(),|()(),()(badPltaPaflbftfxsv⇒VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: t,l,a,b•Initial factorsEliminate: tCompute:∑=tvtltaPtflaf),|()(),(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒),|(),|()(),()(badPltaPaflbftfxsv⇒),|(),()(),(badPlafaflbftxs⇒VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: l,a,b•Initial factorsEliminate: lCompute:∑=ltsllaflbfbaf),(),(),(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒),|(),|()(),()(badPltaPaflbftfxsv⇒),|(),()(),(badPlafaflbftxs⇒),|()(),(badPafbafxl⇒VSLTA BX D),|()|(),|()|()|()|()()(badPaxPltaPsbPslPvtPsPvP• We want to compute P(d)• Need to eliminate: b•Initial factorsEliminate: a,bCompute:∑∑==babaxladbfdfbadpafbafdbf),()(),|()(),(),(),|()|(),|()|()|()()(badPaxPltaPsbPslPsPtfv⇒),|()|(),|(),()(badPaxPltaPlbftfsv⇒),|(),|()(),()(badPltaPaflbftfxsv⇒),|()(),(badPafbafxl⇒),|(),()(),(badPlafaflbftxs⇒)(),(dfdbfba⇒⇒Variable Elimination• We now understand variable elimination as a sequence of rewriting operations• Actual computation is done in elimination step• Exactly the same computation procedure applies to Markov networks• Computation depends on order of elimination3∑=xkxkxyyxfyyf),,,('),,(11KK∏==milikxiyyxfyyxf1,1,1,11),,(),,,(' KKComplexity of variable elimination• Suppose in one elimination step we computeThis requires • multiplications– For each value for x, y1, …, yk, we do mmultiplications• additions– For each value of y1, …, yk, we do |Val(X)| additionsComplexity is exponential in number of variables in the intermediate factor!∏⋅⋅iiYXm)Val()Val(∏⋅iiYX)Val()Val(Understanding Variable Elimination• We want to select “good” elimination orderings that reduce complexity• We start by attempting to understand variable elimination via the graph we are working with• This will reduce the problem of finding good ordering to graph-theoretic operation that is well-understoodUndirected graph representation• At each stage of the procedure, we have an algebraic term that we need to evaluate• In general this term is of the form:where Ziare sets of variables• We now plot a graph where there is undirected edge X--Y if X,Y are arguments of some factor– that is, if X,Y are in some Zi• Note: this is the Markov network that describes the probability on the variables we did not eliminate yet∑∑∏=1)(),,(1yyiiknfxxPiZKKChordal Graphs• elimination ordering ⇒ undirected chordal graphGraph:• Maximal cliques are factors in elimination• Factors in elimination are cliques in the graph• Complexity is exponential in size of the largest clique in graphLTA BXVSDVSLTA BX DInduced Width• The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination• This quantity is called the induced width of a graph according to the specified ordering• Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph General Networks•From graph theory:Thm:• Finding an ordering that minimizes the induced width is NP-HardHowever,• There are reasonable heuristic for finding “relatively” good ordering• There are provable approximations to the best induced width• If the graph has a small induced width, there are algorithms that find it in polynomial time4Elimination on Trees• Formally, for any tree, there is an elimination ordering with induced width = 1Thm• Inference on trees is linear in number of variablesPolyTrees• A polytree is a network where there is at most one path from one variable to anotherThm:• Inference in a polytree is linear in the representation size of the network– This assumes tabular CPT representationACBDEF GHApproaches to inference•Exact inference – Inference in Simple Chains– Variable elimination– Clustering / join tree algorithms• Approximate inference– Stochastic simulation / sampling methods– Markov chain Monte Carlo methods– Mean field theoryStochastic simulation• Suppose you are given values for some subset of the variables, G, and want to infer values for unknown variables, U• Randomly generate a very large number of instantiations from the BN– Generate instantiations for all variables – start at root variables and work your way “forward”• Only keep those instantiations that are consistent with the values for G• Use the frequency of values for U to get estimated probabilities• Accuracy of the results depends on the size of the sample (asymptotically approaches exact


View Full Document

UMD CMSC 828G - Principles of Data Mining

Documents in this Course
Lecture 2

Lecture 2

35 pages

Load more
Download Principles of Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Principles of Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Principles of Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?