DOC PREVIEW
Berkeley COMPSCI C281B - Online Convex Optimization

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 22Online Convex OptimizationLecturer: Alexander Rakhlin Scribe: Jiening Zhan1 Recap: Prediction with Expert AdviceIn the last lecture, the following online prediction algorithm based on exp ert a dv ic e was presented. Given aconvex loss function l(·),For t = 1, ..., Tplayer observes f1,t, ..., f1,Nplayer predicts ptadversary reveals outcome ytplayer suffers loss l(pt, yt)exp erts suffer loss l(fi,t, yt)EndThe go al is to minimize the regret,RT=TXt=1l(pt, yt) − mini∈[N]TXi=1l(fi,t, yt) (1)By using expo nential weights e−ηLi,t, and predictions pt=Piwi,tfi,tPiwi,t, an upper bound to the regret wasfound to be RT≤qT2log N.Based o n this algorithm, the online linear optimization and the online convex o ptimization algorithm arederived.2 Online Linear OptimizationLet ∆Ndenote the N dimensional simplex.For t = 1, ..., Tplayer predicts wt∈ ∆N(wtis essentially a probability distribution)adversary reveals lt∈ RNplayer suffers loss wt· ltwhere lt(i) = l(fi,t, yt)End12 Online Convex OptimizationThe regret is defined as,RT=TXt=1wt· lt− minw∗∈∆NTXt=1w∗· lt(2)Note that fo r the simplex, the distribution w∗will place all the probability on the best expert.Figure 1: Online Linear Optimization (K is simplex)3 Online Convex OptimizationLet K be a convex region. Also, ∀t ∈ [T ], let lt: RN→ R be conxex.For t = 1, ..., Tplayer predicts wt∈ Kadversary reveals lt(·)Online Convex Optimization 3player suffers loss lt(wt)EndThe regret isRT=TXt=1lt(wt) − minw∗∈KTXt=1lt(w∗) (3)Figure 3 gives an example o f the online convex optimization algorithm. In this algorithm, the adversary isquite ’handicapp e d.’ The worst that an adversary can do is change the loss function a s demonstrated inFigure 3. However, we can always choose a w∗such that the adversary cannot produce too much loss.Figure 2: Online Convex Optimization4 Online Convex OptimizationFigure 3: Adversary is quite handicappe d in the algorithm4 Algorithm: Online Gradient Descent, Zinkevich 03For t = 1, ..., TPredict wtObserve lt(·)Update wt+1= Πk(wt− η∇lt(wt)) where Πk(·) is the euclidean projection onto the set K.EndTheorem 4. 1. Let G = maxt∈[T ]k∆lt(wt)k and D = diameter K. The online gradient descent algorithmattains regr e t RT≤ GD√T .Online Convex Optimization 5Proof. Let ¯wt+1= wt− η∆lt(wt), w∗ = arg minw∈KPTt=1lt(w) and ∆t= ∆lt(wt).kwt+1− w∗k ≤ k¯wt+1− w∗k (4)= k ¯wt− η∆lt(wt) − w∗k2(5)= kwt− w∗k2+ η2k∆tk2− 2η∆t(wt− w∗) (6)Rearranging terms, it c an be seen that∆t(wt− w∗) ≤kwt− w∗k2− kwt+1− w∗k22η+η2k∆tk2(7)For a convex loss function lt,lt(wt) − lt(w∗) ≤ ∆t(wt− w∗) (8)Summing over t = 1, ..., T ,TXt=1(lt(wt) − lt(w∗)) ≤ ∆t(wt− w∗) (9)≤kw1− w∗k22η+η2k∆tk2(10)≤D22η+η2T G2(11)Setting η =DG√T, we get RT≤ GD√TLet K be a simplex of dimension N. It follows that D = 1, and G ≤√N. Therefore, the regret from theonline gradient descent is RT≤√T N. Compared to the regret bound from the prediction using exponentialweights RT≤qT2log N, this bound scales much mo re quickly in N.The online des c ent algorithm behaves differently from the prediction with exponential weights algorithm.Assume that we are choosing weights fro m a thr e e dimensional simplex. At time t, we choose weight wt.If the there is no loss lt= (0, 0, 0), then for both algorithms, wt+1= wt. That is, our position does notchange. However, if lt= (1 , 0, 0), then at time t + 1, in the case of online gradient descent, we will move adistance of η away while in the case of e xponential weights, we will move an exponential distance away. Thisis demonstated in Figure 4.5 Bergman DivergenceSuppose R : RN→ R is strictly convex with continuous 1st order partial derivatives.Definition. Bregman Divergence between x and y with respect to R isDR(x, y) = R(x) − R (y) − ∆R(y)(x −y) (12)Note that the Bregman distance is in general not symmetric.Example.R(x) =12kxk2⇒ DR(x, y) =12kx − yk2(13)6 Online Convex OptimizationFigure 4: Exponential weights algorithm vs. online gradient descentOnline Convex Optimization 7Example.R(x) =NXi=1(xilog xi− xi) ⇒ DR(x, y) = KL(x, y) −NXi=1xilogxiyi+NXi=1(yi− xi) (14)Property.DA+B(x, y) = DA(x, y) + DB(x, y) (15)Property.DR(x, v) + DR(v, w) = DR(u, v) + (u − v)(∆R(w) −∆R(v)) (16)Property. The Bergman projection unto a convex set K exists and is unique. Let w′be the Bergmanprojection of the point w unto the convex set K. It followsw′= arg minv∈KDR(v, w) (17)Property. Generalized Pythagorean Theorem: for all u ∈ KDR(u, w) ≥ DR(u, w′) + DR(w′, w)


View Full Document

Berkeley COMPSCI C281B - Online Convex Optimization

Download Online Convex Optimization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Online Convex Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Online Convex Optimization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?