Subgradients• subgradients• strong and weak subgradient calculus• optimality conditions via subgradients• directional derivativesProf. S. Boyd, EE364b, Stanford UniversityBasic inequalityrecall basic inequality for convex differentiable f:f(y) ≥ f(x) + ∇f(x)T(y − x)• first-order approximation of f at x is global underestimator• (∇f(x), −1) supports epi f at (x, f(x))what if f is not differentiable?Prof. S. Boyd, EE364b, Stanford Uni ver si ty 1Subgradient of a functiong is a subgradient of f (not necessarily convex) at x iff(y) ≥ f(x) + gT(y − x) for all yx1x2f(x1) + gT1(x − x1)f(x2) + gT2(x − x2)f(x2) + gT3(x − x2)f(x)g2, g3are subgradients at x2; g1is a subgradient at x1Prof. S. Boyd, EE364b, Stanford Uni ver si ty 2• g is a subgradient of f at x iff (g, −1) supports epi f at (x, f(x))• g is a subgradient iff f(x) + gT(y − x) is a global (affine)underestimator of f• if f is convex and differentiable, ∇f(x ) is a subgradi en t of f at xsubgradients come up in several contexts:• algorithms for nondifferentiable convex optimiza tio n• convex analysis, e.g., optimality conditions, duali ty for nondiffer en ti ab leproblems(if f(y) ≤ f(x) + gT(y − x) for all y, then g is a supergradient)Prof. S. Boyd, EE364b, Stanford Uni ver si ty 3Examplef = max{f1, f2}, with f1, f2convex and differentiablex0f1(x)f2(x)f(x)• f1(x0) > f2(x0): unique subgradient g = ∇f1(x0)• f2(x0) > f1(x0): unique subgradient g = ∇f2(x0)• f1(x0) = f2(x0): subgradients form a line segment [∇f1(x0), ∇f2(x0)]Prof. S. Boyd, EE364b, Stanford Uni ver si ty 4Subdiff erenti al• set of all subgradients of f at x is called the subdifferential of f at x,denoted ∂f(x)• ∂f(x) is a closed convex set (can be empty)if f is convex,• ∂f(x) is nonempty, for x ∈ relint dom f• ∂f(x) = {∇f(x)}, if f is differentiable at x• if ∂f(x) = {g}, then f is differentiable at x and g = ∇f(x)Prof. S. Boyd, EE364b, Stanford Uni ver si ty 5Examplef(x) = |x|f(x) = |x|∂f (x)xx1−1righthand plot showsS{(x, g) | x ∈ R, g ∈ ∂f(x)}Prof. S. Boyd, EE364b, Stanford Uni ver si ty 6Subgradient calculus• weak subgradient calculus: formulas for finding one subgradientg ∈ ∂f(x)• strong subgradient calculus: formulas for finding the wholesubdifferential ∂f(x), i.e., all subgradients of f at x• many algorithms for nondifferentiable convex optimizat ion re qu ir e onl yone subgradient at each step, so weak calculus suffices• some algorithms, optimality conditions, etc., need whole subdifferential• roughly speaking: if you can compute f(x), you can usually compute ag ∈ ∂f(x)• we’ll assume that f is convex, and x ∈ relint dom fProf. S. Boyd, EE364b, Stanford Uni ver si ty 7Some basic rules• ∂f(x) = {∇f(x)} if f is differentiab l e at x• scaling: ∂(αf) = α∂f (if α > 0)• addition: ∂(f1+ f2) = ∂f1+ ∂f2(RHS is addition of point-to-setmappings)• affine transformation of variables: if g(x) = f(Ax + b), then∂g(x) = AT∂f(Ax + b)• finite pointwise maximum: if f = maxi=1,...,mfi, then∂f(x) = Co[{∂fi(x) | fi(x) = f(x)},i.e., convex hull of union of subdifferentials of ‘active’ fun cti ons at xProf. S. Boyd, EE364b, Stanford Uni ver si ty 8f(x) = max{f1(x), . . . , fm(x)}, with f1, . . . , fmdifferentiable∂f(x) = Co{∇fi(x) | fi(x) = f(x)}example: f(x) = kxk1= max{sTx | si∈ {−1, 1}}11−1−1∂f (x) at x = (0, 0 )11−1at x = (1, 0 )(1,1)at x = (1, 1 )Prof. S. Boyd, EE364b, Stanford Uni ver si ty 9Pointwise supremumif f = supα∈Afα,cl Co[{∂fβ(x) | fβ(x) = f(x)} ⊆ ∂f(x)(usually get equality, but requires some technical conditions to hold, e.g.,A compact, fαcts in x and α)roughly speaking, ∂f(x) is closure of convex hull of union ofsubdifferentials of active functionsProf. S. Boyd, EE364b, Stanford Uni ver si ty 10Weak rule for pointwise supremumf = supα∈Afα• find any β for which fβ(x) = f(x) (assuming supr em um is achie ved )• choose any g ∈ ∂fβ(x)• then, g ∈ ∂f(x)Prof. S. Boyd, EE364b, Stanford Uni ver si ty 11examplef(x) = λmax(A(x)) = supkyk2=1yTA(x)ywhere A(x) = A0+ x1A1+ · · · + xnAn, Ai∈ Sk• f is pointwise supremum of gy(x) = yTA(x)y over kyk2= 1• gyis affine in x, with ∇gy(x) = (yTA1y, . . . , yTAny)• hence, ∂f(x) ⊇ Co {∇gy| A(x)y = λmax(A(x))y, kyk2= 1}(in fact equality holds here)to find one subgradient at x, can choose any unit eigenvector y associatedwith λmax(A(x)); then(yTA1y, . . . , yTAny) ∈ ∂f(x)Prof. S. Boyd, EE364b, Stanford Uni ver si ty 12Expec tati on• f(x) = E f(x, ω), with f convex in x for each ω, ω a random variab l e• for each ω, choo se any gω∈ ∂f(x, ω) (so ω 7→ gωis a function)• then, g = E gω∈ ∂f(x)Monte Carlo method for (approxim ate ly) computing f(x) and a g ∈ ∂f(x):• generate independent samples ω1, . . . , ωKfrom distribution of ω• f(x) ≈ (1/K)PKi=1f(x, ωi)• for each i choose gi∈ ∂xf(x, ωi)• g = (1/K)PKi=1giis an (approximate) subgradient(more on this later)Prof. S. Boyd, EE364b, Stanford Uni ver si ty 13Minimizationdefine g(y) as the optimal value ofminimize f0(x)subject to fi(x) ≤ yi, i = 1, . . . , m(ficonvex; variable x)with λ⋆an optimal dual variable, we haveg(z) ≥ g(y) −mXi=1λ⋆i(zi− yi)i.e., −λ⋆is a subgradient of g at yProf. S. Boyd, EE364b, Stanford Uni ver si ty 14Compos it ion• f(x) = h(f1(x), . . . , fk(x)), with h convex nondecreasing, ficonvex• find q ∈ ∂h(f1(x), . . . , fk(x)), gi∈ ∂fi(x)• then, g = q1g1+ · · · + qkgk∈ ∂f(x)• reduces to standard formula for differentiable h, fiproof:f(y) = h(f1(y), . . . , fk(y))≥ h(f1(x) + gT1(y − x), . . . , fk(x) + gTk(y − x))≥ h(f1(x), . . . , fk(x)) + qT(gT1(y − x), . . . , gTk(y − x))= f(x) + gT(y − x)Prof. S. Boyd, EE364b, Stanford Uni ver si ty 15Subgradients and s uble vel setsg is a subgradient at x means f(y) ≥ f(x) + gT(y − x)hence f(y) ≤ f(x) =⇒ gT(y − x) ≤ 0f(x) ≤ f(x0)x0g ∈ ∂f (x0)x1∇f(x1)Prof. S. Boyd, EE364b, Stanford Uni ver si ty 16• f differentiable at x0: ∇f(x0) is nor mal to the subleve l set{x | f(x) ≤ f(x0)}• f nondifferentiable at x0: subgradient defines a supporting hyperplaneto sublevel set through x0Prof. S. Boyd, EE364b, Stanford Uni ver si ty 17Quasigradientsg 6= 0 is a quasigradient of f at x ifgT(y − x) ≥ 0 =⇒ f(y) ≥ f(x)holds for all ygxf(y) ≤ f(x)quasigradients at x form a coneProf. S. Boyd,
View Full Document