UCI ICS 171 - On Combining Graph-based Variance Reduction Schemes - D2392959

Home> Schools> University of California, Irvine> (ICS) > ICS 171> On Combining Graph-based Variance Reduction Schemes

DOC PREVIEW

UCI ICS 171 - On Combining Graph-based Variance Reduction Schemes

School name University of California, Irvine

Course Ics 171- Introduction to Artificial Intelligence

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

On Combining Graph-based Variance Reduction schemesVibhav Gogate Rina DechterComputer science & EngineeringUniversity of Washington, SeattleSeattle, WA 98195, [email protected] of Information and Computer SciencesUniversity of California, IrvineIrvine, CA 92697, [email protected] this paper, we consider two variance reductionschemes that exploit the structure of the primalgraph of the graphical model: Rao-Blackwellisedw-cutset sampling and AND/OR sampling. Weshow that the two schemes are orthogonal andcan be combined to further reduce the variance.Our combination yields a new family of estima-tors which trade time and space with variance.We demonstrate experimentally that the new es-timators are superior, often yielding an order ofmagnitude improvement over previous schemeson several benchmarks.1 IntroductionImportance sampling (Rubinstein, 1981) is a generalscheme which can be used to approximate variousweighted counting tasks defined over graphical mod-els such as computing the probability of evidence in aBayesian network, computing the partition function of aMarkov network and counting the number of solutions ofa constraint network. The main idea is to transform theweighted counts or summation into an expectation usinga special distribution called the proposal distribution, gen-erate samples from the proposal and estimate the weightedcounts by a weighted average (also called the sample mean)over the generated samples. It is well known that the qual-ity of estimation is highly dependent on the variance of thesample mean and therefore significant research has focusedon reducing its variance (Liu, 2001).In this paper, we consider two graph-based variance re-duction schemes in the context of graphical models: theRao-Blackwellised w-cutset sampling scheme (Bidyuk andAppearing in Proceedings of the 13thInternational Conference onArtificial Intelligence and Statistics (AISTATS) 2010, Chia La-guna Resort, Sardinia, Italy. Volume 9 of JMLR: W&CP 9. Copy-right 2010 by the authors.Dechter, 2007) and the AND/OR sampling scheme (Gogateand Dechter, 2008). Based on the Rao-Blackwell theo-rem (Casella and Robert, 1996) and w-cutset condition-ing (Dechter, 1990), the w-cutset sampling scheme com-bines sampling with exact inference. The idea is to sampleonly a subset C of variables, called the w-cutset and exactlymarginalize out the remaining variables conditioned oneach sampled assignment. The AND/OR sampling scheme,on the other hand, reduces variance by exploiting condi-tional independencies uncovered by the AND/OR tree orgraph (Dechter and Mateescu, 2007) to derive a differentsample mean from the same set of input samples. Previ-ously in (Gogate and Dechter, 2008), we considered two al-ternative AND/OR sample means: one based on AND/ORtree which has the same time and space complexity as theconventional OR tree approach but has smaller varianceand the second based on AND/OR graph which is moreexpensive to compute but has the smallest variance.The main idea in this paper is to combine these twoschemes by performing AND/OR tree or graph samplingover the w-cutset variables and exact inference over theremaining variables conditioned on each sampled assign-ment. We show that this yields new sample means, whichhave smaller variance than the sample means of AND/ORsampling and w-cutset sampling. However, they are moreexpensive to compute both time and space wise and thusthere is a trade-off.We conducted extensive experimental evaluation of all thenew schemes proposed on several benchmark probabilis-tic and deterministic networks. Our results show that asthe networks get larger and harder, exploiting more de-composition improves the accuracy of the estimates as afunction of time. In particular, the scheme that exploitsthe most decomposition, the AND/OR w-cutset graph sam-pling scheme is superior to all the other schemes.The rest of the paper is organized as follows. In Section 2,we present notation and background. Section 3 describesAND/OR w-cutset tree sampling and Section 4 describesAND/OR w-cutset graph sampling. Complexity versusvariance trade-offs are discussed in Section 5. ExperimentsOn Combining Graph-based Variance Reduction schemesare described in Section 6 and Section 7 concludes.2 BackgroundWe start by presenting notation and preliminaries on graph-ical models. Then we present an overview of importancesampling, w-cutset sampling and AND/OR sampling.We denote variables by upper case letters (e.g. X,Y, . ..) andvalues of variables by lower case letters (e.g. x, y, . . .). Setsof variables are denoted by bold upper case letters, (e.g.X = {X1, . . . , Xn}) while sets of values are denoted by boldlower case letters (e.g. x = {x1, . . . , xn}). We denote by Dithe set of possible values of Xi(also called as the domain ofXi).∑x∈Xdenotes the sum over the possible values of vari-ables in X, namely,∑x1∈X1∑x2∈X2. . .∑xn∈Xn. The expectedvalue EQ[X] of a random variable X with respect to a distri-bution Q is defined as: EQ[X] =∑x∈XxQ(x). The varianceVQ[X] of X is defined as: VQ[X] =∑x∈X(x− EQ[X])2.Definition 1 (Graphical models). (Pearl, 1988) A graph-ical model is a three-tuple M = hX, D, Fi where X ={X1, . . . , Xn} is a set ofrandom variables, D = {D1, . . . , Dn}is a set of domains where Diis the domain of Xiand F ={F1, , . . . , Fm} is a set of non-negative real valued functionswhere each Fiis defined over a subset of variables Si⊂ X,called its scope. A graphical model represents a joint dis-tribution over X given by: PM(x) =1Z∏mi=1Fi(x) where Zis a normalization constant given by: Z =∑x∈X∏mi=1Fi(x).We will often refer to Z as weighted counts. The primalgraph of a graphical model is an undirected graph whichhas variables as its vertices and an edge between any twovariables which are included in the scope of a function.We will focus on the query of computing the weightedcounts Z. It is easy to show that the weighted counts spe-cialize to the probability of evidence of a Bayesian net-work, the partition function of a Markov network and thenumber of solutions of a constraint network.2.1 Importance SamplingImportance sampling (Rubinstein, 1981; Liu, 2001) isa general Monte Carlo simulation technique which canbe used for estimating various statistics of a given tar-get distribution such as PM. Since it is often hard tosample from PM, the main idea is to generate samplesfrom another easy-to-simulate distribution Q called theproposal

View Full Document