EECS281B/STAT241B: Advanced Topics in Statistical Learning Spring 2009Lecture 13 — March 4Lecturer: Martin Wainwright Scribe: G ia Vinh Anh PhamNote: These lecture notes a re still rough, and have only have been mildly proofread. This is the danger environment.13.1 RecapTheorem 13.1 (Hoeffding). Say X1, . . . , Xnare independent random variables with ai≤Xi≤ biP"1nnXi=1(Xi− E[Xi])> ǫ#≤ 2 exp−2nǫ21nPni=1(bi− ai)2Proof: Using Chernoff inequality,P"1nnXi=1(Xi− E[Xi]) > ǫ#≤ exp(−λnǫ)E"exp λnXi=1(Xi− E[Xi])!#= exp(−λnǫ)E"nYi=1exp (λ(Xi− E[Xi]))#= exp(−λnǫ)nYi=1E [exp (λ(Xi− E[Xi]))]Since eλtis a convex function, and E [Xi− E[Xi]] = 0 we haveE [exp (λ(Xi− E[Xi]))] ≤−aibi− aiebiλ+bibi− aieaiλ= eaiλ1 +aibi− ai−aibi− aiebiλDefine g(u) = log(1 − c + ceu) − cu where c =−aibi−ai, then the above inequality is equivalenttoE [exp (λ(Xi− E[Xi]))] ≤ eg(λ(ai−bi)We note thatg( 0) = 0g′(u) =ceu1 − c + ceu− c ⇒ g′(0) = 0g′′(u) =ceu(1 − c)(1 − c + ceu)2≤14(ceu+ 1 − c)2(1 − c + ceu)2=14, ∀u13-1EECS281B/STAT241B Lecture 13 — March 4 Spring 2009Using Taylor series expansion we can writeg( u) = g(0) + g′(0)u +g′′(¯u)2u2≤18u2So,E [exp (λ(Xi− E[Xi]))] ≤ eg(λ(ai−bi)≤18λ2(ai− bi)2ThereforeP"1nnXi=1(Xi− E[Xi]) > ǫ#≤ exp −λnǫ +λ28nXi=1(bi− ai)2!≤ exp−2nǫ21nPni=1(bi− ai)2 13.2 Martingale inequalities(Way of relaxing independence assumptions)Definition: A sequence of integrable random variables Z1, Z2, . . . is a martingale ifE [Zn+1|Z1, . . . , Zn] = Zn, ∀n = 1, 2, . . .Example:Say V1, V2, . . . are independent zero mean ra ndom variables. Define Sn=Pni=1Vi. This is amartingale:E [Sn+1|S1, . . . , Sn] = E [Sn+ Vn+1|S1, . . . , Sn]= Sn+ E [Vn+1|S1, . . . , Sn]= Sn+ E [Vn+1] (by independence)= SnDefinition: A sequence of random variables V1, V2, . . . is a martingale difference sequence(MDS) ifE [Vn+1|V1, . . . , Vn] = 0, ∀n = 1, 2, . . .Proposition: If Z1, Z2, . . . is a martingale then {Vn|Vn+1= Zn+1− Zn} is a MDS.Lemma 13.2 (Azuma-Hoeffding). Say V1, V2, . . . is a martingale difference sequence withai≤ Vi≤ bi, i = 1, . . . , n. Then ∀ǫ > 0P"1nnXi=1Vi> ǫ#≤ 2 exp−2nǫ21nPni=1(bi− ai)213-2EECS281B/STAT241B Lecture 13 — March 4 Spring 2009Proof: Claim that ∀λ ∈ R , i = 1, 2 . . .:E [exp(λVi)|V1, . . . , Vi−1] ≤ expλ28(bi− ai)2(This is because Vi|V1, . . . , Vi−1is a zero-mean random va riable bounded in [ai, bi])Use Chernoff inequality again, ∀λ > 0P"1nnXi=1Vi> ǫ#≤ exp(−λnǫ)E"nYi=1exp (λVi)#= exp(−λnǫ)EV1...Vn−1"EVn"nYi=1exp (λVi) |V1, . . . , Vn−1##= exp(−λnǫ)EV1...Vn−1"n−1Yi=1exp (λVi) EVn[exp (λVn) |V1, . . . , Vn−1]#≤ exp(−λnǫ)E"n−1Yi=1exp (λVi)#expλ28(bn− an)2≤ . . . ≤ exp(−λ2nǫ) exp nXi=1λ28(bi− ai)2!.From this point, the remainder of the proof follows that of the ordinary Hoeffding inequalityfor sums of independent RVs. In particular, we optimize our bound over λ > 0, and therebyobtain thatP"1nnXi=1Vi> ǫ#≤ exp−2nǫ21nPni=1(bi− ai)2,as claimed. Reference book: Grimmett & Stirzaker, Probability & random processes has further back-ground on martingales and their properties.13.3 Complexity of function classesRecapIn proof of classical Glivenko-Cantelli theorem, needed to bound cardinality of{I(Z(1)≤ t), . . . , I(Zn≤ t)|t ∈ R}Saw this cardinality is at most n + 1.More generally, let A be a class of subsets in Rd.(In previous case, A = Ilef t= {(−∞, t]|t ∈ R})Definition13-3EECS281B/STAT241B Lecture 13 — March 4 Spring 20091. For Z(1), . . . , Z(n)∈ Rd, defineS(A, {Z(i)}ni=1) = card{A ∩ {Z(i)}ni=1|A ∈ A}Say that A shatters {Z(i)}ni=1if S(A, {Z(i)}ni=1) = 2n2. The nthshatter coefficient iss(A, n) = maxZ(1),...,Z(n)S(A, {Z(i)}ni=1)3. The Vapnik-Chernovenkis (VC) dimension of A isVA= sup{n ∈ N|s(A, n) = 2n}Examples:1. VIlef t= 12. Consider A = {(s, t]|s < t; s, t ∈ R}. ThenS(A, 1) = 2S(A, 2) = 22S(A, 3) = 7 < 23S(A, n) =n(n+1)2+ 1So VA= 2Theorem 13.3 (Vapnik-Chernovenkis). For any class of sets APsupA∈AˆPn(A) − P (A)> ǫ≤ 8s(A, n) exp−nǫ232whereˆPn(A) =1nPni=1I[Z(i)∈
View Full Document