MIT 6 454 - The Cross-Entropy Method - D2112192

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 454> The Cross-Entropy Method

DOC PREVIEW

MIT 6 454 - The Cross-Entropy Method

School name Massachusetts Institute of Technology

Course 6 454- Graduate Seminar in Area I

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

The Cross-Entropy MethodGuy Weichenberg17 September 20031 IntroductionThis report is a summary of the theory underlying the Cross-Entropy (CE)method, as discussed in the tutorial by de Boer, Kroese, Mannor and Rubinstein[1]. For a more thorough discussion of the method and its applications, pleaserefer to the original tutorial and the references cited in the tutorial.The CE method, pioneered by Rubinstein in 1997 as an adaptive algorithmfor estimating probabilities of rare events, has been broadened as a generic andeﬃcient tool for solving a myriad of NP-hard problems. Beyond its originalpurpose, the CE method has been employed in deterministic and stochasticcombinatorial optimization problems (COPs) and continuous multi-extremaloptimization problems.This report is organized as follows. In Section 2, we discuss the fundamentaltheory of the CE method and specialize the method to rare-event simulation(RES) and COPs. In Section 3, we consider more sophisticated versions of theCE method, and brieﬂy discuss convergence issues. We conclude the report inSection 4.2 The Cross-Entropy methodRegardless of the application at hand, the crux of the CE method remainsthe same. Abstractly, the CE method is an iterative algorithm comprising thefollowing two steps:1. Generate random data samples using a set of dynamic parameters.2. Update the parameters governing the random data generation using thedata samples themselves with the objective of improving future data sam-ples.In the remainder of this section, we specialize the above general algorithm toRES and COPs. Reﬁnements to the CE method and convergence are discussedin Section 3.12.1 Rare-event simulationThe estimation of the probability of rare events often arises in assessing theperformance of various engineering systems. If analytical or asymptotic charac-terizations of the system are unavailable, as is often the case, one must resort tosimulation techniques. The simplest and most ineﬃcient simulation techniqueis Crude Monte Carlo (CMC) simulation, where the system is simulated undernormal operating parameters for a very long time. A more clever simulationtechnique is Importance Sampling (IS), where the system is simulated undera diﬀerent (but related) set of parameters which render the occurrence of therare-event of interest more likely. The diﬃcultly with the IS technique, as weshall see, is obtaining an optimal (or near-optimal) alternative set of parametersunder which we would like to simulate the system. The CE method, when usedin the context of RES, acts as an adaptive IS simulation technique in that ititeratively reﬁnes estimates of the optimal set of alternative IS parameters.We now discuss the theory behind the CE method when applied to RES. LetX =(X1,...,Xn) be a random vector taking values in some space X ,letS bea real-valued function on X , and let f(·; u) be a probability density function onX parameterized by u. Suppose now that we are interested in the probabilityof occurrence of a rare event. Speciﬁcally, we are interested in the very smallprobability l that S(x)isgreaterthanorequaltoarealnumberγ under f(·; u):l = Pu(S(X) ≥ γ)=EuI{S(X)≥γ}.In a CMC approach to estimating l, we would simply draw a random sampleX1,...,XNfrom f(·; u) and compute:1NNi=1I{S(Xi)≥γ}to arrive at an unbiased estimate of l. Clearly, for rare events, most of the termsin the above summation will be zero, thus requiring a very large value of N toobtain a meaningful estimate of l.An IS approach to estimating l would proceed as follows. We ﬁrst note that,using an importance sampling density g, we can represent l as:l =I{S(x)≥γ}f(x; u)g(x)g(x)dx = EgI{S(X)≥γ}f(X; u)g(X).Hence, an unbiased estimator of l is: l =1NNi=1I{S(Xi)≥γ}W (Xi), (1)where W (x)=f(x; u)/g(x)isthelikelihood ratio (LR), and X1,...,XNarei.i.d. vectors drawn from g. The optimal importance sampling density is:g∗(x)=I{S(x)≥γ}f(x; u)l2in that the resulting estimate (1) has zero variance and only one sample is re-quired. As we mentioned earlier, the problem with the IS approach is obtainingg∗, as this density depends on the quantity l which we are attempting to es-timate. In addition, it is often convenient to choose an importance samplingdensity of the same form as f(·; u). In this case, we are faced with the taskof ﬁnding the density f(·; v) which is “closest” to g∗in some sense. A conve-nient measure of “closeness” for this task is the Kullback-Leibler (K-L) distance,which is deﬁned as:D(g,h)=Eglng(X)h(X)=g(x)lng(x)dx −g(x)lnh(x)dx.Minimizing the K-L distance between g∗and f (·; v) is equivalent to the followingmaximization problem:maxvEuI{S(X)≥γ}ln f(X; v),which, using another importance sampling density f(·; w), can be rewritten as:maxvEwI{S(X)≥γ}W (X; u, w)lnf(X; v). (2)where W (x; u, w)=f(x; u)/f(x; w) is the LR between f(·; u)andf(·; w)atx.The value v∗which maximizes (2) can be estimated by solving the stochasticcounterpart of (2):v∗= argmaxv1NNi=1I{S(Xi)≥γ}W (Xi; u, w)lnf(Xi; v). (3)where X1,...,XNare iid vectors drawn from f(·; w). In instances where thefunction to be maximized in (3) is convex and diﬀerentiable with respect to v,the solution may be analytically obtained by solving the following system ofequations1:1NNi=1I{S(Xi)≥γ}W (Xi; u, w)∇ ln f (Xi; v)=0. (4)Note, however, that the approaches implied by (3) and (4) to estimating v∗only yield meaningful results when not too many of the I{S(Xi)≥γ}terms arezero — that is, when the event of interest is not rare. For rare events, we musttherefore resort to an alternative technique, such as an algorithm based on theCE method.We approach the task of estimating v∗by following an iterative algorithm inwhich the two parameters vtand γtassociated with each iteration progressivelyapproach v∗and γ, respectively.1The convenience of the K-L distance measure is now apparent, since it lends to analyticsolutions. On the other hand, alternative metrics, such as variance minimization, generallyinvolve complicated numerical optimization techniques [1].3Speciﬁcally, we begin by assigning v0= u, generating N samples X1,...,XNfrom the density f(·; v0), and assigning γ1to the value which makes the prob-ability l1= Ev0I{S(X)≥γ1}approximately equal to ρ, a speciﬁed constant. Wecomplete the ﬁrst iteration by letting v1be equal to the optimal vector param-eter for estimating l1. Continuing in this way, γtwill increase

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

MIT 6 454 - The Cross-Entropy Method

Sign up for free to view:

Please select your school