MIT 6 454 - The Cross-Entropy Method - D2813109

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 454> The Cross-Entropy Method

DOC PREVIEW

MIT 6 454 - The Cross-Entropy Method

School name Massachusetts Institute of Technology

Course 6 454- Graduate Seminar in Area I

Pages 30

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

The Cross-Entropy Method6.454 Seminar17 September 2003Guy Weichenberg1Outline• Introduction.• CE method for rare-event simulation (RES).– Example.• CE method for combinatorial optimization problems (COPs).– Example.• Enhancements.• Convergence.• Conclusions.2Introduction• CE method broadened from a rare-event simulation techniqueto generic tool for solving diﬀerent NP-hard problems.• CE method a global iterative search algorithm comprisingfollowing two steps:1. Generate random data samples using set of dynamicparameters.2. Update parameters governing random data generation usingdata samples themselves with objective of improving futuredata samples.3CE Method forRare-Event Simulation4Preliminaries• Let X =(X1,...,Xn) be random vector taking values in somespace X .• Let S be real-valued function on X .• Let f(·; u) be probability density function on X parameterizedby u.Suppose we are interested in probability of occurrence of a rareevent. Speciﬁcally, we are interested in very small probability l thatS(x) is greater than or equal to real number γ under f(·; u):l = Pu(S(X) ≥ γ)=EuI{S(X)≥γ}.5Crude Monte CarloSimulation• In CMC approach to estimating l, we simply draw randomsample X1,...,XNfrom f(·; u) and compute:1NNi=1I{S(Xi)≥γ}to arrive at unbiased estimate of l.• For rare events, most terms in above summation will be zero,thus requiring very large value of N to obtain meaningfulestimate of l.6Importance Sampling• Using importance sampling density g, we represent l as:l =I{S(x)≥γ}f(x; u)g(x)g(x)dx = EgI{S(X)≥γ}f(X; u)g(X).• Hence, unbiased estimator of l is: l =1NNi=1I{S(Xi)≥γ}W (Xi),where W (x)=f(x; u)/g(x)islikelihood ratio (LR), andX1,...,XNare i.i.d. vectors drawn from g.7Importance Sampling(continued)• Optimal importance sampling density is:g∗(x)=I{S(x)≥γ}f(x; u)l→ estimate has zero variance and only one sample is required.• Problem with IS approach is obtaining g∗, as it depends on lwhich we are attempting to estimate. In addition, it is oftenconvenient to choose importance sampling density of same formas f(·; u).8Importance Sampling(continued)• When choosing importance sampling density of form f(·; u), wemust ﬁnd density f(·; v) which is “closest” to g∗.Convenientmeasure of “closeness” is Kullback-Leibler (K-L) distance:D(g, h)=Eglng(X)h(X)=g(x)lng(x)dx −g(x)lnh(x)dx.• Minimizing K-L distance between g∗and f(·; v) is equivalentto following maximization problem:maxvEuI{S(X)≥γ}ln f(X; v)=maxvEwI{S(X)≥γ}W (X; u, w)lnf(X; v),where W (x; u, w)=f(x; u)/f(x; w) is the LR between f(·; u)and f(·; w)atx.9Importance Sampling(continued)• Maximizing v∗can be estimated by solving stochasticcounterpart:v∗= argmaxv1NNi=1I{S(Xi)≥γ}W (Xi; u, w)lnf(Xi; v), (1)where X1,...,XNare iid vectors drawn from f(·; w).• When function to be maximized in (1) is diﬀerentiable w.r.t. v,solution may be analytically obtained by solving following system of equations:1NNi=1I{S(Xi)≥γ}W (Xi; u, w)∇ ln f(Xi; v)=0. (2)10Importance Sampling(continued)• Approaches implied by (1) and (2) to estimating v∗only yieldmeaningful estimates when not too many of I{S(Xi)≥γ}termsare zero (i.e. when the event of interest is not rare).11CE Algorithm forRare-Event SimulationFor rare events, we estimate v∗using following iterative algorithmin which vtand γtassociated with each iteration progressivelyapproach v∗and γ, respectively:1. Deﬁne vo= u.Sett =1.2. Generate samples X1,...,XNfrom density f(·; vt−1)andcompute sample (1 − ρ)-quantile of γtof performances (i.e. γt= S((1−ρ)))provided γtis less than γ. Otherwise, set γt= γ.3. Use same sample X1,...,XNto solve stochastic program: vt= argmaxv1NNi=1I{S(Xi)≥ γt}W (Xi; u, vt−1)lnf(Xi; v).12CE Algorithm forRare-Event Simulation(continued)4. If γt<γ,sett = t + 1 and reiterate from Step 2. Otherwise,proceed with Step 5.5. Estimate rare-event probability l using LR estimate: l =1N1N1i=1I{S(Xi)≥γ}W (Xi; u, vT),where T denotes ﬁnal number of iterations.13Shortest PathExampleSuppose the edge weights X =(X1,...,X5) are independent andexponentially distributed with means u =(u1,...,u5).A B X1 X2 X3 X4 X5 Let S(X) be the shortest path length from A to B. We wish toestimate:l = Pu(S(X) ≥ γ)=EuI{S(X)≥γ}.14Example(continued)Step 1: Set t = 1, and the initial density to:f(x; v0)=exp−5j=1xjuj5j=11ujStep 2: Generate samples X1,...,XNfrom the density:f(x; vt−1)=exp−5j=1xjvt−1,j5j=11vt−1,jStep 3: Solve the stochastic program: vt= argmaxv1NNi=1I{S(Xi)≥ γt}W (Xi; u, vt−1)lnf(Xi; v).15Example(continued)where:W (Xi; u, vt−1)=exp−5j=1xj1uj−1 vt−1,j5j=1 vt−1,jujby solving the following set of equations (obtained from (2)):Ni=1I{S(Xi)≥ γt}W (Xi; u, vt−1)Xijv2j−1vj=0,j=1,. . . ,5.Thus, vt,j=Ni=1I{S(Xi)≥ γt}W (Xi; u, vt−1)XijNi=1I{S(Xi)≥ γt}W (Xi; u, vt−1).16Example(continued)Performance comparison:• CMC: Using 108samples, we obtain l =1.30 × 10−5withrelative error (Var( l)1/2/ l) of 0.03 in 6350 seconds.• CE: Using N = 1000, N1=105and ρ =0.1, we obtain l =1.34 × 10−5with relative error of 0.03 in 3 seconds (5iterations).17CE Method forCombinatorial Optimization18Preliminaries• Let X be a ﬁnite set of states.• Let S be a real-valued performance function we wish tomaximize over X .• Let γ∗be maximum value of S.• Let x∗be state at which this maximum occurs .• Let f(·; v) be probability mass function on X , parameterizedby vector v.19Recasting the COPMain idea: Recast deterministic COP into probabilistic framework,where RES technique can be used.• For certain parameter vector u and threshold γ,deﬁnel(γ)asprobability that performance function S exceeds γ:l(γ)=Pu(S(X) ≥ γ)=xI{S(x)≥γ}f(x; u)=EuI{S(X)≥γ}.• In typical COPs, |X | is very large and l(γ∗)=f(x∗; u) ≈ 1/|X |is consequently very small → l(γ∗)arare event.20CE Algorithm forCombinatorial Optimization1. Deﬁne vo= u.Sett =1.2. Generate samples X1,...,XNfrom density f(·; vt−1)andcompute sample (1 − ρ)-quantile of γtof performances (i.e. γt= S((1−ρ)))provided γtis less than γ. Otherwise, set γt= γ.3. Use same sample X1,...,XNto solve stochastic program: vt= argmaxv1NNi=1I{S(Xi)≥ γt}ln f(Xi; v). (3)4. If for some t ≥ d: γt= γt−1= ···=

View Full Document