CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 25Follow the perturbed leader, online shortest pathLecturer: Sasha Rakhlin Scribes: Barlas O˘guz, David Burkett1 Oracle RegretWe begin by a lemmaLemma 1.1. Ifxt+1= arg minx∈KηtXs=1ls(x) + R(x)then for any u ∈ KTXt=1lt(xt+1) ≤TXt=1lt(u) + η−1(R(u) −R(x1)) (1)where x1= arg min R(x).This says that the hypothetical forecaster who knows ltbefore having to predict xtis almost perfect.Proof. Use induction.T = 0 true, because R(x1) ≤ R(u), ∀u ∈ K.Suppose statement holds for T − 1.T −1Xt=1lt(xt+1) + η−1R(x1) ≤T −1Xt=1lt(u) + η−1R(u), ∀u (2)in particular, this holds for u = xT +1.T −1Xt=1lt(xt+1) + η−1R(x1) ≤T −1Xt=1lt(xT +1) + η−1R(xT +1) (3)Add lT(xT +1) to both sidesTXt=1lt(xt+1) + η−1R(x1) ≤TXt=1lt(xT +1) + η−1R(xT +1) (4)TXt=1lt(xt+1) + η−1R(x1) ≤TXt=1lt(u) + η−1R(u), ∀u (5)As a consequence, for x∗= arg minx∈KPTt=1lt(x),TXt=1lt(xt) −TXt=1lt(x∗) ≤TXt=1(lt(xt) − lt(xt+1)) + η−1(R(x∗) − R(x1)) (6)12 Follow the perturbed leader, online shortest path2 Regret Bounds for Follow the Perturbed LeaderNow we focus on linear lt(.)Solve xt+1= arg minx∈KηPts=1lsx + rx, where r is drawn at the beginning of the game from distributionf.Assume oblivious adversary: l1. . . lTare chosen by the adversary before the game.Theorem 2.1 (A). Suppose lt∈ Rn+, K ⊂ Rn+, f (r) has support on Rn+.∀u ∈ K, ETXt=1ltxt≤TXt=1ltu +TXt=1ltZ{r:f (r)≥f (r−ηlt)}xtf(r) dr + η−1E supx∈Krx (7)Proof. Let x∗= arg minPTt=1ltx.TXt=1ltxt−TXt=1ltx∗≤TXt=1lt(xt− xt+1) + η−1(rx∗− rx1) (8)and since rx ≥ 0,TXt=1ltxt−TXt=1ltx∗≤TXt=1lt(xt− xt+1) + η−1rx∗(9)Taking expectations, we haveE(TXt=1ltxt−TXt=1ltx∗) ≤ E(TXt=1(ltxt− ltxt+1)) + η−1(rx∗− rx1) (10)We want to have E(PTt=1(ltxt− ltxt+1)) small.Eltxt=Zltarg minx((ηt−1Xs=1ls+ r)x)f (r) dr (11)Eltxt+1=Zltarg minx((ηtXs=1ls+ r)x)f (r) dr (12)Let r0= r + ltEltxt=Zltarg minx((ηT −1Xs=1ls+ r0)x)f(r0− ηlt) dr0(13)Elt(xt− xt+1) =Zltxt(f(r) − f(r − ηlt))dr = ltZxt(f(r) − f(r − ηlt))dr (14)≤ ltZ{r:f (r)>f (r−ηlt)}xtf(r)dr = ltExt1{r:f (r)>f (r−ηlt)}(15)Now, we prove an analogous theorem where we relax the restriction to the positive orthant.Follow the perturbed leader, online shortest path 3Theorem 2.2 (B).ETXt=1ltxt≤ sup r, tf(r)f(r − ηlt)·Xlt+ η−1E supx∈Kr · x + η−1E supx∈K−r · x¸(16)for any u ∈ KProof.Eltxt=Zltargminx∈K"Ãηt−1Xs=1ls+ r!x#f(r)dr (17)≤ supr,tf(r)f(r − ηlt)Zltargminx∈K"ÃηtXs=1ls+ r!x#f(r)dr (18)= supr,tf(r)f(r − ηlt)Eltxt+1(19)By lemma 1.1,∀u ∈ K,TXt=1ltxt+1≤TXt=1lt· u + η−1supx∈K(r · x) + η−1supx∈K(−r · x) (20)So,ETXt=1ltxt= supr,tf(r)f(r − ηlt)EÃTXt=1lt· u + η−1supx∈K(r · x) + η−1supx∈K(−r · x)!(21)3 Examples3.1 Expert SettingHere, K is the n−simplex. We will draw r ∼ Unif¡[0, 1]N¢and lt∈ [0, 1]NApplying Theorem A,ERT≤TXt=1ltZ{r:f (r)>f (r−ηlt)}xtf(r)dr + η−1E supx∈Kr · x| {z }≤1(22)≤TXt=1Z{r:f (r)>f (r−ηlt)}f(r)dr + η−1(23)≤TXt=1V ol({r : ∃i, ri− ηlt(i) < 0}) + η−1(24)≤TXt=1ηNXi=1lt(i) + η−1(25)≤ η−1+ T ηN (26)= 2√T N (with η =1√T N) (27)4 Follow the perturbed leader, online shortest pathBy using Theorem B, this result can be improved to replace the N term with log(N).3.2 Online Shortest PathIn this setting, there is a fixed DAG with labeled vertices u and v such that v is reachable from u. At eachtime step, the player picks a path from u to v, and then the opponent reveals the cost of each edge. Theloss is the cost of the chosen path.Each path can be associated with some x ∈ {0, 1}|E|, and the set of paths is P ⊆ {0, 1}|E|.The adversary picks lt∈ Rn+, so the loss can be written as lt· xtas usual.To use the Follow the Perturbed Leader methodology, we draw r ∼ Unif([0, 1]|E|). Suppose, lt∈ [0, 1]|E|and the length of the longest path (number of edges) is ξ.Then, applying Theorem A,ltZ{r:f (r)>f (r−ηlt)}xtf(r)dr ≤ ||lt||∞¯¯¯¯¯¯¯¯¯¯Z{r:f (r)>f (r−ηlt)}xtf(r)dr¯¯¯¯¯¯¯¯¯¯1(28)≤ ξZ{r:f (r)>f (r−ηlt)}f(r)dr (29)≤ ξ|E|η (30)So,ERT≤ η−1ξ + ξ|E|ηT (31)= 2ξp|E|T (with η =1√|E|T) (32)By using Theorem B, this result can be improved to replace the |E| term with
View Full Document