DOC PREVIEW
MIT 6 454 - The Cross-Entropy Method

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The Cross-Entropy MethodGuy Weichenberg17 September 20031 IntroductionThis report is a summary of the theory underlying the Cross-Entropy (CE)method, as discussed in the tutorial by de Boer, Kroese, Mannor and Rubinstein[1]. For a more thorough discussion of the method and its applications, pleaserefer to the original tutorial and the references cited in the tutorial.The CE method, pioneered by Rubinstein in 1997 as an adaptive algorithmfor estimating probabilities of rare events, has been broadened as a generic andefficient tool for solving a myriad of NP-hard problems. Beyond its originalpurpose, the CE method has been employed in deterministic and stochasticcombinatorial optimization problems (COPs) and continuous multi-extremaloptimization problems.This report is organized as follows. In Section 2, we discuss the fundamentaltheory of the CE method and specialize the method to rare-event simulation(RES) and COPs. In Section 3, we consider more sophisticated versions of theCE method, and briefly discuss convergence issues. We conclude the report inSection 4.2 The Cross-Entropy methodRegardless of the application at hand, the crux of the CE method remainsthe same. Abstractly, the CE method is an iterative algorithm comprising thefollowing two steps:1. Generate random data samples using a set of dynamic parameters.2. Update the parameters governing the random data generation using thedata samples themselves with the objective of improving future data sam-ples.In the remainder of this section, we specialize the above general algorithm toRES and COPs. Refinements to the CE method and convergence are discussedin Section 3.12.1 Rare-event simulationThe estimation of the probability of rare events often arises in assessing theperformance of various engineering systems. If analytical or asymptotic charac-terizations of the system are unavailable, as is often the case, one must resort tosimulation techniques. The simplest and most inefficient simulation techniqueis Crude Monte Carlo (CMC) simulation, where the system is simulated undernormal operating parameters for a very long time. A more clever simulationtechnique is Importance Sampling (IS), where the system is simulated undera different (but related) set of parameters which render the occurrence of therare-event of interest more likely. The difficultly with the IS technique, as weshall see, is obtaining an optimal (or near-optimal) alternative set of parametersunder which we would like to simulate the system. The CE method, when usedin the context of RES, acts as an adaptive IS simulation technique in that ititeratively refines estimates of the optimal set of alternative IS parameters.We now discuss the theory behind the CE method when applied to RES. LetX =(X1,...,Xn) be a random vector taking values in some space X ,letS bea real-valued function on X , and let f(·; u) be a probability density function onX parameterized by u. Suppose now that we are interested in the probabilityof occurrence of a rare event. Specifically, we are interested in the very smallprobability l that S(x)isgreaterthanorequaltoarealnumberγ under f(·; u):l = Pu(S(X) ≥ γ)=EuI{S(X)≥γ}.In a CMC approach to estimating l, we would simply draw a random sampleX1,...,XNfrom f(·; u) and compute:1NNi=1I{S(Xi)≥γ}to arrive at an unbiased estimate of l. Clearly, for rare events, most of the termsin the above summation will be zero, thus requiring a very large value of N toobtain a meaningful estimate of l.An IS approach to estimating l would proceed as follows. We first note that,using an importance sampling density g, we can represent l as:l =I{S(x)≥γ}f(x; u)g(x)g(x)dx = EgI{S(X)≥γ}f(X; u)g(X).Hence, an unbiased estimator of l is: l =1NNi=1I{S(Xi)≥γ}W (Xi), (1)where W (x)=f(x; u)/g(x)isthelikelihood ratio (LR), and X1,...,XNarei.i.d. vectors drawn from g. The optimal importance sampling density is:g∗(x)=I{S(x)≥γ}f(x; u)l2in that the resulting estimate (1) has zero variance and only one sample is re-quired. As we mentioned earlier, the problem with the IS approach is obtainingg∗, as this density depends on the quantity l which we are attempting to es-timate. In addition, it is often convenient to choose an importance samplingdensity of the same form as f(·; u). In this case, we are faced with the taskof finding the density f(·; v) which is “closest” to g∗in some sense. A conve-nient measure of “closeness” for this task is the Kullback-Leibler (K-L) distance,which is defined as:D(g,h)=Eglng(X)h(X)=g(x)lng(x)dx −g(x)lnh(x)dx.Minimizing the K-L distance between g∗and f (·; v) is equivalent to the followingmaximization problem:maxvEuI{S(X)≥γ}ln f(X; v),which, using another importance sampling density f(·; w), can be rewritten as:maxvEwI{S(X)≥γ}W (X; u, w)lnf(X; v). (2)where W (x; u, w)=f(x; u)/f(x; w) is the LR between f(·; u)andf(·; w)atx.The value v∗which maximizes (2) can be estimated by solving the stochasticcounterpart of (2):v∗= argmaxv1NNi=1I{S(Xi)≥γ}W (Xi; u, w)lnf(Xi; v). (3)where X1,...,XNare iid vectors drawn from f(·; w). In instances where thefunction to be maximized in (3) is convex and differentiable with respect to v,the solution may be analytically obtained by solving the following system ofequations1:1NNi=1I{S(Xi)≥γ}W (Xi; u, w)∇ ln f (Xi; v)=0. (4)Note, however, that the approaches implied by (3) and (4) to estimating v∗only yield meaningful results when not too many of the I{S(Xi)≥γ}terms arezero — that is, when the event of interest is not rare. For rare events, we musttherefore resort to an alternative technique, such as an algorithm based on theCE method.We approach the task of estimating v∗by following an iterative algorithm inwhich the two parameters vtand γtassociated with each iteration progressivelyapproach v∗and γ, respectively.1The convenience of the K-L distance measure is now apparent, since it lends to analyticsolutions. On the other hand, alternative metrics, such as variance minimization, generallyinvolve complicated numerical optimization techniques [1].3Specifically, we begin by assigning v0= u, generating N samples X1,...,XNfrom the density f(·; v0), and assigning γ1to the value which makes the prob-ability l1= Ev0I{S(X)≥γ1}approximately equal to ρ, a specified constant. Wecomplete the first iteration by letting v1be equal to the optimal vector param-eter for estimating l1. Continuing in this way, γtwill increase


View Full Document

MIT 6 454 - The Cross-Entropy Method

Documents in this Course
Load more
Download The Cross-Entropy Method
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Cross-Entropy Method and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Cross-Entropy Method 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?