Stanford EE 364B - Subgradient Methods - D556600

Home> Schools> Stanford University> Electrical Engineering (EE) > EE 364B> Subgradient Methods

Stanford EE 364B - Subgradient Methods

Course Ee 364b- Convex Optimization II (cme 364b)

Pages 28

Download Save

Unformatted text preview:

Subgradient Methods• subgradient method and stepsize rules• convergence results and proof• optimal step size and alternating projections• speeding up subgradient methodsProf. S. Boyd, EE364b, Stanford UniversitySubgradient methodsubgradient method is simple algorithm to minimize nondifferentiableconvex functi on fx(k+1)= x(k)− αkg(k)• x(k)is the kth iterate• g(k)is any subgradient of f at x(k)• αk> 0 is the kth step sizenot a descent method, so we keep track of best point so farf(k)best= mini=1,...,kf(x(i))Prof. S. Boyd, EE364b, Stanford University 1Step size rulesstep sizes are fixed ahead of time• constant step size: αk= α (constant)• constant step length: αk= γ/kg(k)k2(so kx(k+1)− x(k)k2= γ)• square summable but not summable: step size s satisfy∞Xk=1α2k< ∞,∞Xk=1αk= ∞• nonsummable diminishing: step sizes satisfylimk→∞αk= 0,∞Xk=1αk= ∞Prof. S. Boyd, EE364b, Stanford University 2Assumptions• f⋆= infxf(x) > −∞, wi th f(x⋆) = f⋆• kgk2≤ G for all g ∈ ∂f (equivalent to Lipschitz condition on f)• kx(1)− x⋆k2≤ Rthese assump ti ons are stronger than needed, just to simplify proofsProf. S. Boyd, EE364b, Stanford University 3Convergence resultsdefine¯f = limk→∞f(k)best• constant step size:¯f − f⋆≤ G2α/2, i.e.,converges to G2α/2-suboptimal(converges to f⋆if f differen ti ab le , α small enough)• constant step length:¯f − f⋆≤ Gγ/2, i.e.,converges to Gγ/2-suboptimal• diminishing step size rule:¯f = f⋆, i.e., convergesProf. S. Boyd, EE364b, Stanford University 4Convergence proofkey quantity: Euclidean distance to the optimal set, not the function valuelet x⋆be any minimizer of fkx(k+1)− x⋆k22= kx(k)− αkg(k)− x⋆k22= kx(k)− x⋆k22− 2αkg(k)T(x(k)− x⋆) + α2kkg(k)k22≤ kx(k)− x⋆k22− 2αk(f(x(k)) − f⋆) + α2kkg(k)k22using f⋆= f(x⋆) ≥ f(x(k)) + g(k)T(x⋆− x(k))Prof. S. Boyd, EE364b, Stanford University 5apply recu rs iv el y to getkx(k+1)− x⋆k22≤ kx(1)− x⋆k22− 2kXi=1αi(f(x(i)) − f⋆) +kXi=1α2ikg(i)k22≤ R2− 2kXi=1αi(f(x(i)) − f⋆) + G2kXi=1α2inow we usekXi=1αi(f(x(i)) − f⋆) ≥ (f(k)best− f⋆) kXi=1αi!to getf(k)best− f⋆≤R2+ G2Pki=1α2i2Pki=1αi.Prof. S. Boyd, EE364b, Stanford University 6constant step size: for αk= α we getf(k)best− f⋆≤R2+ G2kα22kαrighthand side converges to G2α/2 as k → ∞constant step length: for αk= γ/kg(k)k2we getf(k)best− f⋆≤R2+Pki=1α2ikg(i)k222Pki=1αi≤R2+ γ2k2γk/G,righthand side converges to Gγ/2 as k → ∞Prof. S. Boyd, EE364b, Stanford University 7square summable but not summable step sizes:suppose step sizes satisfy∞Xk=1α2k< ∞,∞Xk=1αk= ∞thenf(k)best− f⋆≤R2+ G2Pki=1α2i2Pki=1αias k → ∞, numerator converges to a finite number, denominatorconverges to ∞, so f(k)best→ f⋆Prof. S. Boyd, EE364b, Stanford University 8Stopping criterion• terminating whenR2+ G2Pki=1α2i2Pki=1αi≤ ǫ is really, reall y, slow• optimal ch oi ce of αito achieveR2+ G2Pki=1α2i2Pki=1αi≤ ǫ for small es t k:αi= (R/G)/√k, i = 1, . . . , knumber of steps required: k = (RG/ǫ)2• the truth: there really isn’t a good stoppin g criterion for the subgradi entmethod . . .Prof. S. Boyd, EE364b, Stanford University 9Example: Piecewise linear minimizationminimize f(x) = maxi=1,...,m(aTix + bi)to find a subgradient of f: find index j for whichaTjx + bj= maxi=1,...,m(aTix + bi)and take g = ajsubgradient method: x(k+1)= x(k)− αkajProf. S. Boyd, EE364b, Stanford University 10problem instance with n = 20 variables, m = 100 terms, f⋆≈ 1.1constant step length, γ = 0.05, 0.01, 0.005, first 100 iterations20 40 60 80 10010−1100 kf(k)− f⋆γ = .05γ = .01γ = .005Prof. S. Boyd, EE364b, Stanford University 11f(k)best− f⋆, constant step length γ = 0.05, 0.01, 0.005500 1000 1500 2000 2500 300010−310−210−1100 kf(k)best− f⋆γ = .05γ = .01γ = .005Prof. S. Boyd, EE364b, Stanford University 12diminishing step rul e αk= 0.1/√k and square summab le step size ruleαk= 1/k0 500 1000 1500 2000 2500 300010−310−210−1100101 kf(k)best− f⋆αk= .1/√kαk= 1/kProf. S. Boyd, EE364b, Stanford University 13Optimal step size when f⋆is known• choice du e to Polyak:αk=f(x(k)) − f⋆kg(k)k22(can also use when optimal valu e is estimated)• motivation: start with basic inequalitykx(k+1)− x⋆k22≤ kx(k)− x⋆k22− 2αk(f(x(k)) − f⋆) + α2kkg(k)k22and choose αkto minimiz e righthand sideProf. S. Boyd, EE364b, Stanford University 14• yieldskx(k+1)− x⋆k22≤ kx(k)− x⋆k22−(f(x(k)) − f⋆)2kg(k)k22(in particular, kx(k)− x⋆k2decreases at each step)• applying recursively,kXi=1(f(x(i)) − f⋆)2kg(i)k22≤ R2and sokXi=1(f(x(i)) − f⋆)2≤ R2G2which proves f(x(k)) → f⋆Prof. S. Boyd, EE364b, Stanford University 15PWL examp le with Polyak’s step size, αk= 0.1/√k, αk= 1/k0 500 1000 1500 2000 2500 300010−310−210−1100101 kf(k)best− f⋆Polyakαk= .1/√kαk= 1/kProf. S. Boyd, EE364b, Stanford University 16Finding a point in the intersection of convex setsC = C1∩ ··· ∩ Cmis nonempty, C1, . . . , Cm⊆ Rnclosed and convexfind a point in C by minimizingf(x) = max{dist(x, C1), . . . , dist(x, Cm)}with dist(x, Cj) = f(x), a subgradient of f isg = ∇dist(x , Cj) =x − PCj(x)kx − PCj(x)k2Prof. S. Boyd, EE364b, Stanford University 17subgradient update with optimal step size:x(k+1)= x(k)− αkg(k)= x(k)− f(x(k))x(k)− PCj(x(k))kx(k)− PCj(x(k))k2= PCj(x(k))• a version of the famou s alternating projections algorithm• at each step, project the current point onto the farthest set• for m = 2 sets, projections alte rn ate onto one se t, then the oth er• convergence: dist(x(k), C) → 0 as k → ∞Prof. S. Boyd, EE364b, Stanford University 18Alternating projectionsfirst few iterations:C1C2x(1)x(2)x(3)x(4)x∗. . . x(k)eventually converges to a point x∗∈ C1∩ C2Prof. S. Boyd, EE364b, Stanford University 19Example: Positive semidefinite matrix completion• some entries of matrix in Snfixed; find values for others so compl etedmatrix is PSD• C1= Sn+, C2is (affin e) set in Snwith specified fixed entries• projection onto C1by eigenvalue decomposition, truncation: forX =Pni=1λiqiqTi,PC1(X) =nXi=1max{0, λi}qiqTi• projection of X onto C2by re-setting specified entr ie s to fixed valu esProf. S. Boyd, EE364b, Stanford University 20specific exampl e: 50 × 50 matrix missing

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford EE 364B - Subgradient Methods

Sign up for free to view:

Please select your school