Stanford EE 364B - The Linear Collider Collaboration - D2280583

Home> Schools> Stanford University> Electrical Engineering (EE) > EE 364B> The Linear Collider Collaboration

Stanford EE 364B - The Linear Collider Collaboration

Course Ee 364b- Convex Optimization II (cme 364b)

Pages 15

Download Save

Unformatted text preview:

Ellipsoid MethodS. BoydNotes for EE364b, Stanford UniversityMay 26, 2014These notes were taken from the book Linear Controller Design: Limits of Performance,by Boyd and Barratt [BB91], and edited (at various times over many years) by StephenBoyd, Joelle Skaf, and Nicholas Moehle.Contents1 Basic ellipsoid algorithm 22 Deep-cut ellipsoid algorithm 63 Ellipsoid algorithm with constraints 84 Numerical examples 95 Ellipsoid method for other problems 12A Notes and references 141rx(k)rx(k+1)rE(k)E(k+1)g(k)Figure 1: The shaded half ellipsoid, known to contain a minimizer, is enclosed bythe ellipsoid of smallest volume, denoted E(k+1), centered at x(k+1).1 Basic ellipsoid algorithmEllipsoid method idea. We consid er the problem of minimizing a convex function f :Rn→ R. The ellipsoid algorithm generates a “decreasing” sequence of elli psoids in Rnthatare guar anteed to contain a minimizi n g point, using the idea that given a subgradient (orquasigradient) at a point, we can find a half-space containing the point that is guaranteednot to contain any minimizers of f, i.e., a cutting-plane.Suppose that we have an ellipsoid E(k)that is guaranteed to contain a minimizer of f. Inthe basic ellipsoid algorith m , we comput e a subgradi ent g(k)of f at the center, x(k), of E(k).We t h en know that the half ellipsoidE(k)∩ {z | g(k)T(z −x(k)) ≤ 0}contains a minimizer of f. We compute t h e ellipsoid E(k+1)of minimum volume that containsthe sliced half ellipsoid; E(k+1)is then guaranteed to contain a m inimizer of f, as shown infigure 1. The process is then repeated.We will see that the volume of the ellipsoid decreases, geometrically, which will allow usto show that the method converges, i n d e ed with a complexity estimate that i s good (at leastin theor y) .Special case: Bisection on R. When n = 1, this algorithm i s the standard bisectionmethod on R. To see this, we note that an ellipsoid in R is nothing but an interva l. Weevaluate the derivative (or a subgradient) of f at the midpoint (which is the center), anddepending on its sign, the sliced half ellipsoid is either the left or right half of the interval . InR, a half ellipsoid is also an interval, so the minimum volume (in t h i s case, length) ellipsoidthat cover s the half ellipsoid is just the interval, which has exactly half the length of theprevious ellipsoid.2Ellipsoid method update. We now describe the algorithm more explicitly. An ellipsoi dE can be described asE = {z | (z − x)TP−1(z − x) ≤ 1}.where P ∈ Sn++. The center of the ell i p soi d is x, and the matrix P gives the size and shapeor orientation of E: the square roots of the eigenvalues of P are the lengths of the semi - axesof E. The volume of E i s given byvol(E) = βn√detP ,where βn= πn/2/Γ(n/2 + 1) i s the volume of the unit ball in Rn.For n > 1, the minimum volume ellipsoid that contains the half ellipsoid{z | (z − x)TP−1(z − x) ≤ 1, gT(z − x) ≤ 0}is given byE+= {z | (z − x+)T(P+)−1(z − x+) ≤ 1},wherex+= x −1n + 1P ˜g, (1)P+=n2n2− 1P −2n + 1P ˜g˜gTP, (2)and˜g =1qgTP ggis a normalized subgradient.Let’s give some interpretations of these equations. The step given in (1) is in the directionof the negative subgradient, in the coordinates given by the matrix P . The step P ˜g i s thestep to th e boundary of E+, so (1) is a step that is a fracti on 1/(n + 1) to the boundary.From (2), we can think of E(k+1)as slightly thinner t h a n E(k)in the direct i on g(k), and slightlyenlarged over all.Basic ellipsoid method. The basic ellipsoid algorithm is:Ellipsoid methodgiven an initial ellipsoid (P(0), x(0)) containing a minimizer of f .k := 0.repeatCompute a subgradient: g(k)∈ ∂f(x(k)).Normalize the subgradi ent: ˜g :=1√g(k)TP(k)g(k)g(k).3Update elli p s oi d center: x(k+1):= x(k)−1n+1P(k)˜g.Update elli p s oi d shape: P(k+1):=n2n2−1P(k)−2n+1P(k)˜g˜gTP(k).k := k + 1.The ellipsoid method is not a descent method, so we keep track of the best point found.We defin ef(k)best= minj=0,...,kf(x(j)).Proof of convergence. In this section we show that the ellipsoid algorithm converges,i.e.,limk→∞f(k)best= f⋆,provided our original ellipsoid E(0), which we take to be a ball, i.e., x(0)= 0, P(0)= R2I,contains a minimizing point in its interior. Even though E(k+1)can be larger than E(k)inthe sense of maximum semi-axis (λmax(P(k+1)) > λmax(P(k)) is possible), it turn s out thatits volume is less:vol(E(k+1)) =nn + 1(n+1)/2nn − 1(n−1)/2vol(E(k))< e−1/2nvol(E(k)),by a factor that only depends on the dimension n. The first line is simpl e calculation fromthe formula for updating the mat r ix P that describes the ellipsoid (2) and basic determinantidentities.We suppose that x⋆∈ E(0)and that f(K)best≥ f⋆+ ǫ, where ǫ > 0. This means that fork = 1, . . . , K, f(x(k)) > f⋆+ ǫ. Then every po int z excluded in iterations 0, . . . , K hasf(z) ≥ f⋆+ ǫ, since at iteration k the function values in each excluded half-space are at l ea stf(x(k)).We defin eG = maxg ∈ ∂f (x), x ∈ E(0)kgk,as the m a xi mum length of the subgradients over the initial ellipsoid. (In fact, G is a Lipschitzconstant for f over the initial ellipsoid.)Then in the ballB = {z | kz − x⋆k ≤ ǫ/G}.we have f(z) ≤ f⋆+ǫ. We assume without loss of generality that B ⊆ E(0), and consequentlyno point of B was excl u ded in iterations 1, . . . , K, so we haveB ⊆ E(k).Thus, vol(E(k)) ≥ vol(B), soe−K/2nvol(E(0)) ≥ (ǫ/G)nβn.4For E(0)= {z | kzk ≤ R} we have vol(E(0)) = Rnβn, so, taking logs,−K2n+ n log R ≥ n logǫG,and ther efor eK ≤ 2n2logRGǫ.Thus to comp u t e f⋆with error at most ǫ, it takes no more than 2n2log RG/ǫ iterations ofthe ellipsoid algorithm; this numb e r grows slowly with both dimension n and accuracy ǫ (atleast, from the point of view of complexity theory).Interpretation. From the initial ellipsoid (a ball of radius R) and the Lipschitz b oundon f (i.e., G), we know that our original point is n o more th a n RG suboptimal. After Ksteps we have produced a point that is ǫ suboptimal. Thus the ratio RG/ǫ is the fractionalreduction in our ign o r an ce about the number f⋆. The complexity analysis above says thatthe number of steps required is proportional to the log of this rat i o. This is exactly like thebisection method in R (and indee d , it is the bisect i on method for n = 1). We might saythat th e ellipsoid method is a generaliz at i on of the bisection method

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford EE 364B - The Linear Collider Collaboration

Sign up for free to view:

Please select your school