The Cross-Entropy Method6.454 Seminar17 September 2003Guy Weichenberg1Outline• Introduction.• CE method for rare-event simulation (RES).– Example.• CE method for combinatorial optimization problems (COPs).– Example.• Enhancements.• Convergence.• Conclusions.2Introduction• CE method broadened from a rare-event simulation techniqueto generic tool for solving different NP-hard problems.• CE method a global iterative search algorithm comprisingfollowing two steps:1. Generate random data samples using set of dynamicparameters.2. Update parameters governing random data generation usingdata samples themselves with objective of improving futuredata samples.3CE Method forRare-Event Simulation4Preliminaries• Let X =(X1,...,Xn) be random vector taking values in somespace X .• Let S be real-valued function on X .• Let f(·; u) be probability density function on X parameterizedby u.Suppose we are interested in probability of occurrence of a rareevent. Specifically, we are interested in very small probability l thatS(x) is greater than or equal to real number γ under f(·; u):l = Pu(S(X) ≥ γ)=EuI{S(X)≥γ}.5Crude Monte CarloSimulation• In CMC approach to estimating l, we simply draw randomsample X1,...,XNfrom f(·; u) and compute:1NNi=1I{S(Xi)≥γ}to arrive at unbiased estimate of l.• For rare events, most terms in above summation will be zero,thus requiring very large value of N to obtain meaningfulestimate of l.6Importance Sampling• Using importance sampling density g, we represent l as:l =I{S(x)≥γ}f(x; u)g(x)g(x)dx = EgI{S(X)≥γ}f(X; u)g(X).• Hence, unbiased estimator of l is: l =1NNi=1I{S(Xi)≥γ}W (Xi),where W (x)=f(x; u)/g(x)islikelihood ratio (LR), andX1,...,XNare i.i.d. vectors drawn from g.7Importance Sampling(continued)• Optimal importance sampling density is:g∗(x)=I{S(x)≥γ}f(x; u)l→ estimate has zero variance and only one sample is required.• Problem with IS approach is obtaining g∗, as it depends on lwhich we are attempting to estimate. In addition, it is oftenconvenient to choose importance sampling density of same formas f(·; u).8Importance Sampling(continued)• When choosing importance sampling density of form f(·; u), wemust find density f(·; v) which is “closest” to g∗.Convenientmeasure of “closeness” is Kullback-Leibler (K-L) distance:D(g, h)=Eglng(X)h(X)=g(x)lng(x)dx −g(x)lnh(x)dx.• Minimizing K-L distance between g∗and f(·; v) is equivalentto following maximization problem:maxvEuI{S(X)≥γ}ln f(X; v)=maxvEwI{S(X)≥γ}W (X; u, w)lnf(X; v),where W (x; u, w)=f(x; u)/f(x; w) is the LR between f(·; u)and f(·; w)atx.9Importance Sampling(continued)• Maximizing v∗can be estimated by solving stochasticcounterpart:v∗= argmaxv1NNi=1I{S(Xi)≥γ}W (Xi; u, w)lnf(Xi; v), (1)where X1,...,XNare iid vectors drawn from f(·; w).• When function to be maximized in (1) is differentiable w.r.t. v,solution may be analytically obtained by solving following system of equations:1NNi=1I{S(Xi)≥γ}W (Xi; u, w)∇ ln f(Xi; v)=0. (2)10Importance Sampling(continued)• Approaches implied by (1) and (2) to estimating v∗only yieldmeaningful estimates when not too many of I{S(Xi)≥γ}termsare zero (i.e. when the event of interest is not rare).11CE Algorithm forRare-Event SimulationFor rare events, we estimate v∗using following iterative algorithmin which vtand γtassociated with each iteration progressivelyapproach v∗and γ, respectively:1. Define vo= u.Sett =1.2. Generate samples X1,...,XNfrom density f(·; vt−1)andcompute sample (1 − ρ)-quantile of γtof performances (i.e. γt= S((1−ρ)))provided γtis less than γ. Otherwise, set γt= γ.3. Use same sample X1,...,XNto solve stochastic program: vt= argmaxv1NNi=1I{S(Xi)≥ γt}W (Xi; u, vt−1)lnf(Xi; v).12CE Algorithm forRare-Event Simulation(continued)4. If γt<γ,sett = t + 1 and reiterate from Step 2. Otherwise,proceed with Step 5.5. Estimate rare-event probability l using LR estimate: l =1N1N1i=1I{S(Xi)≥γ}W (Xi; u, vT),where T denotes final number of iterations.13Shortest PathExampleSuppose the edge weights X =(X1,...,X5) are independent andexponentially distributed with means u =(u1,...,u5).A B X1 X2 X3 X4 X5 Let S(X) be the shortest path length from A to B. We wish toestimate:l = Pu(S(X) ≥ γ)=EuI{S(X)≥γ}.14Example(continued)Step 1: Set t = 1, and the initial density to:f(x; v0)=exp−5j=1xjuj5j=11ujStep 2: Generate samples X1,...,XNfrom the density:f(x; vt−1)=exp−5j=1xjvt−1,j5j=11vt−1,jStep 3: Solve the stochastic program: vt= argmaxv1NNi=1I{S(Xi)≥ γt}W (Xi; u, vt−1)lnf(Xi; v).15Example(continued)where:W (Xi; u, vt−1)=exp−5j=1xj1uj−1 vt−1,j5j=1 vt−1,jujby solving the following set of equations (obtained from (2)):Ni=1I{S(Xi)≥ γt}W (Xi; u, vt−1)Xijv2j−1vj=0,j=1,. . . ,5.Thus, vt,j=Ni=1I{S(Xi)≥ γt}W (Xi; u, vt−1)XijNi=1I{S(Xi)≥ γt}W (Xi; u, vt−1).16Example(continued)Performance comparison:• CMC: Using 108samples, we obtain l =1.30 × 10−5withrelative error (Var( l)1/2/ l) of 0.03 in 6350 seconds.• CE: Using N = 1000, N1=105and ρ =0.1, we obtain l =1.34 × 10−5with relative error of 0.03 in 3 seconds (5iterations).17CE Method forCombinatorial Optimization18Preliminaries• Let X be a finite set of states.• Let S be a real-valued performance function we wish tomaximize over X .• Let γ∗be maximum value of S.• Let x∗be state at which this maximum occurs .• Let f(·; v) be probability mass function on X , parameterizedby vector v.19Recasting the COPMain idea: Recast deterministic COP into probabilistic framework,where RES technique can be used.• For certain parameter vector u and threshold γ,definel(γ)asprobability that performance function S exceeds γ:l(γ)=Pu(S(X) ≥ γ)=xI{S(x)≥γ}f(x; u)=EuI{S(X)≥γ}.• In typical COPs, |X | is very large and l(γ∗)=f(x∗; u) ≈ 1/|X |is consequently very small → l(γ∗)arare event.20CE Algorithm forCombinatorial Optimization1. Define vo= u.Sett =1.2. Generate samples X1,...,XNfrom density f(·; vt−1)andcompute sample (1 − ρ)-quantile of γtof performances (i.e. γt= S((1−ρ)))provided γtis less than γ. Otherwise, set γt= γ.3. Use same sample X1,...,XNto solve stochastic program: vt= argmaxv1NNi=1I{S(Xi)≥ γt}ln f(Xi; v). (3)4. If for some t ≥ d: γt= γt−1= ···=
View Full Document