Unformatted text preview:

Subgradient Methods• subgradient method and stepsize rules• convergence results and proof• optimal step size and alternating projections• speeding up subgradient methodsProf. S. Boyd, EE364b, Stanford UniversitySubgradient methodsubgradient method is simple algorithm to minimize nondifferentiableconvex functi on fx(k+1)= x(k)− αkg(k)• x(k)is the kth iterate• g(k)is any subgradient of f at x(k)• αk> 0 is the kth step sizenot a descent method, so we keep track of best point so farf(k)best= mini=1,...,kf(x(i))Prof. S. Boyd, EE364b, Stanford University 1Step size rulesstep sizes are fixed ahead of time• constant step size: αk= α (constant)• constant step length: αk= γ/kg(k)k2(so kx(k+1)− x(k)k2= γ)• square summable but not summable: step size s satisfy∞Xk=1α2k< ∞,∞Xk=1αk= ∞• nonsummable diminishing: step sizes satisfylimk→∞αk= 0,∞Xk=1αk= ∞Prof. S. Boyd, EE364b, Stanford University 2Assumptions• f⋆= infxf(x) > −∞, wi th f(x⋆) = f⋆• kgk2≤ G for all g ∈ ∂f (equivalent to Lipschitz condition on f)• kx(1)− x⋆k2≤ Rthese assump ti ons are stronger than needed, just to simplify proofsProf. S. Boyd, EE364b, Stanford University 3Convergence resultsdefine¯f = limk→∞f(k)best• constant step size:¯f − f⋆≤ G2α/2, i.e.,converges to G2α/2-suboptimal(converges to f⋆if f differen ti ab le , α small enough)• constant step length:¯f − f⋆≤ Gγ/2, i.e.,converges to Gγ/2-suboptimal• diminishing step size rule:¯f = f⋆, i.e., convergesProf. S. Boyd, EE364b, Stanford University 4Convergence proofkey quantity: Euclidean distance to the optimal set, not the function valuelet x⋆be any minimizer of fkx(k+1)− x⋆k22= kx(k)− αkg(k)− x⋆k22= kx(k)− x⋆k22− 2αkg(k)T(x(k)− x⋆) + α2kkg(k)k22≤ kx(k)− x⋆k22− 2αk(f(x(k)) − f⋆) + α2kkg(k)k22using f⋆= f(x⋆) ≥ f(x(k)) + g(k)T(x⋆− x(k))Prof. S. Boyd, EE364b, Stanford University 5apply recu rs iv el y to getkx(k+1)− x⋆k22≤ kx(1)− x⋆k22− 2kXi=1αi(f(x(i)) − f⋆) +kXi=1α2ikg(i)k22≤ R2− 2kXi=1αi(f(x(i)) − f⋆) + G2kXi=1α2inow we usekXi=1αi(f(x(i)) − f⋆) ≥ (f(k)best− f⋆) kXi=1αi!to getf(k)best− f⋆≤R2+ G2Pki=1α2i2Pki=1αi.Prof. S. Boyd, EE364b, Stanford University 6constant step size: for αk= α we getf(k)best− f⋆≤R2+ G2kα22kαrighthand side converges to G2α/2 as k → ∞constant step length: for αk= γ/kg(k)k2we getf(k)best− f⋆≤R2+Pki=1α2ikg(i)k222Pki=1αi≤R2+ γ2k2γk/G,righthand side converges to Gγ/2 as k → ∞Prof. S. Boyd, EE364b, Stanford University 7square summable but not summable step sizes:suppose step sizes satisfy∞Xk=1α2k< ∞,∞Xk=1αk= ∞thenf(k)best− f⋆≤R2+ G2Pki=1α2i2Pki=1αias k → ∞, numerator converges to a finite number, denominatorconverges to ∞, so f(k)best→ f⋆Prof. S. Boyd, EE364b, Stanford University 8Stopping criterion• terminating whenR2+ G2Pki=1α2i2Pki=1αi≤ ǫ is really, reall y, slow• optimal ch oi ce of αito achieveR2+ G2Pki=1α2i2Pki=1αi≤ ǫ for small es t k:αi= (R/G)/√k, i = 1, . . . , knumber of steps required: k = (RG/ǫ)2• the truth: there really isn’t a good stoppin g criterion for the subgradi entmethod . . .Prof. S. Boyd, EE364b, Stanford University 9Example: Piecewise linear minimizationminimize f(x) = maxi=1,...,m(aTix + bi)to find a subgradient of f: find index j for whichaTjx + bj= maxi=1,...,m(aTix + bi)and take g = ajsubgradient method: x(k+1)= x(k)− αkajProf. S. Boyd, EE364b, Stanford University 10problem instance with n = 20 variables, m = 100 terms, f⋆≈ 1.1constant step length, γ = 0.05, 0.01, 0.005, first 100 iterations20 40 60 80 10010−1100 kf(k)− f⋆γ = .05γ = .01γ = .005Prof. S. Boyd, EE364b, Stanford University 11f(k)best− f⋆, constant step length γ = 0.05, 0.01, 0.005500 1000 1500 2000 2500 300010−310−210−1100 kf(k)best− f⋆γ = .05γ = .01γ = .005Prof. S. Boyd, EE364b, Stanford University 12diminishing step rul e αk= 0.1/√k and square summab le step size ruleαk= 1/k0 500 1000 1500 2000 2500 300010−310−210−1100101 kf(k)best− f⋆αk= .1/√kαk= 1/kProf. S. Boyd, EE364b, Stanford University 13Optimal step size when f⋆is known• choice du e to Polyak:αk=f(x(k)) − f⋆kg(k)k22(can also use when optimal valu e is estimated)• motivation: start with basic inequalitykx(k+1)− x⋆k22≤ kx(k)− x⋆k22− 2αk(f(x(k)) − f⋆) + α2kkg(k)k22and choose αkto minimiz e righthand sideProf. S. Boyd, EE364b, Stanford University 14• yieldskx(k+1)− x⋆k22≤ kx(k)− x⋆k22−(f(x(k)) − f⋆)2kg(k)k22(in particular, kx(k)− x⋆k2decreases at each step)• applying recursively,kXi=1(f(x(i)) − f⋆)2kg(i)k22≤ R2and sokXi=1(f(x(i)) − f⋆)2≤ R2G2which proves f(x(k)) → f⋆Prof. S. Boyd, EE364b, Stanford University 15PWL examp le with Polyak’s step size, αk= 0.1/√k, αk= 1/k0 500 1000 1500 2000 2500 300010−310−210−1100101 kf(k)best− f⋆Polyakαk= .1/√kαk= 1/kProf. S. Boyd, EE364b, Stanford University 16Finding a point in the intersection of convex setsC = C1∩ ··· ∩ Cmis nonempty, C1, . . . , Cm⊆ Rnclosed and convexfind a point in C by minimizingf(x) = max{dist(x, C1), . . . , dist(x, Cm)}with dist(x, Cj) = f(x), a subgradient of f isg = ∇dist(x , Cj) =x − PCj(x)kx − PCj(x)k2Prof. S. Boyd, EE364b, Stanford University 17subgradient update with optimal step size:x(k+1)= x(k)− αkg(k)= x(k)− f(x(k))x(k)− PCj(x(k))kx(k)− PCj(x(k))k2= PCj(x(k))• a version of the famou s alternating projections algorithm• at each step, project the current point onto the farthest set• for m = 2 sets, projections alte rn ate onto one se t, then the oth er• convergence: dist(x(k), C) → 0 as k → ∞Prof. S. Boyd, EE364b, Stanford University 18Alternating projectionsfirst few iterations:C1C2x(1)x(2)x(3)x(4)x∗. . . x(k)eventually converges to a point x∗∈ C1∩ C2Prof. S. Boyd, EE364b, Stanford University 19Example: Positive semidefinite matrix completion• some entries of matrix in Snfixed; find values for others so compl etedmatrix is PSD• C1= Sn+, C2is (affin e) set in Snwith specified fixed entries• projection onto C1by eigenvalue decomposition, truncation: forX =Pni=1λiqiqTi,PC1(X) =nXi=1max{0, λi}qiqTi• projection of X onto C2by re-setting specified entr ie s to fixed valu esProf. S. Boyd, EE364b, Stanford University 20specific exampl e: 50 × 50 matrix missing


View Full Document

Stanford EE 364B - Subgradient Methods

Download Subgradient Methods
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Subgradient Methods and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Subgradient Methods 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?