DOC PREVIEW
Berkeley COMPSCI C281B - Lecture Notes

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 18ΠF(n) for parameterized F , covering numbers, Rn(F )Lecturer: Peter Bartlett Scribe: Eilyan Bitar1 ΠF(n) for parameterized FF = {x → sign(f(a, x)) | a ∈ Rd, f : Rd× R → R}For a family of classifiers F , linear in a, we have dv c(F ) = d, where d = the number of parameters.Example. f(w, x) = sin(wx) ⇒ dv c(F ) = ∞, even though f is smooth.Set w = πc, where c has a binary representation 0.b1b2....bn1.Set xi= 2ifor i = 1, ..., n. Thensin(wxi) = sin(2i× π × 0.b1....bn1)= sin(π × b1....bi.bi+1....bn1)= sin(π × bi.b1+1....bn1)which implies that sign(sin(wxi)) = bi. Hence, we can always find a set of size n ∀n.Example (Neural Nets).f(θ, x) =kXi=1αiσ(βTix)|{z }squashingfunction+αoFor what σ : R → [0, 1] is dv c(F ) < ∞ ?For instance, if σ(α ) =11 + e−α+ cα3e−α2sin(α)| {z }Looks like a sigmoid buthas a sinusoid hidden in it, we have dv c(F ) = ∞. Take note that σ is convex left ofzero and concave right of zero.12 ΠF(n) for parameterized F , covering numbers, Rn(F )Consider the function h : Rd|{z}a× Rm|{z}x→ {+ − 1} that can be computed by an algorithm that takes as input,(a, x) ∈ Rd× Rm, and returns as h(a, x) after ≤ t operations:• arithmetic, (+, −, ×, ÷)• conditionals (<, >, ≤, ≥)• outputs ±1Definition. For a class, F , of real valued functions oncont.in az}|{Rd×X , we say h is a k − combination of sign(F )if:H = {x → g(sign(f1(a, x)), ...., sign(fk(a, x))) | a ∈ Rd} for fixed g : {±1}k→ {±1} and f1, ...., fk∈ F .E.g. For a t − step computable h, we have a 2t-combination of sign(F ) for F = polynomials of degree ≤ 2t.Theorem 1.1. For H a k − combination of sign(F ),ΠH(n) ≤dXi=0knimax{fj}∈F,{xj}∈XCCi\j=1{a | fj(a, xj) = 0}|{z }number of connected componentsin the solution setExample. Linear threshold function (1 − combination of sign(F ))• fjis linear in a.• CCi\j=1{a | fj(a, xj) = 0}| {z }defines a subspace= 0 or 1Corollary 1.2. For F , polynomialy parameterized, with degree ≤ m, we haveΠH(n) ≤ 22enkmdddv c(H) ≤ 2d log (2ekm)ΠF(n) for parameterized F , covering numbers, Rn(F ) 3• Hence, t − step computable, H has dv c(H) ≤ 4d(t + 2)(using ΠH(n) < 2n⇒ dv c(H) < n).Note: With the addition of exponentials in the model of computation, we have dv c(H) = O(t2d2).Proof. Proof idea of previous theorem.• ΠH(n) = max{|H|S| : S ⊆ X , |S| = n}• Zij= {a | fi(a, xj) = 0}, assume regular intersections between these subspaces.Lemma 1.3 (Warren 1960).CCRd−[i,jZij≤XI⊆{(i,j)}CC \i∈IZi!Summarize: dv c(H) = O(dt) for t − step computeable h: 2t− combination of sign(F ) for F = polynomialwith degree ≤ 2t.2 Covering NumbersDefinition. For a metric space, (S, ρ), and a subspace, T ⊆ S, we say thatˆT is an ε − coverof T if∀ t ∈ T, ∃ˆt ∈ˆT such that ρ(t,ˆt) < ε.Definition. The ε − covering numberof (T, ρ):N(ε, T, ρ) = min{|ˆT | :ˆT is an ε − cover of T }.Note: Entrop y := log N (ε, T, ρ)Example. T ⊆ [0, 1]nis a d-dimensional subspace. A bound on the covering number for this subspace canbe found in terms of a uniform grid of ε-balls over the subspace, i.e.N(ε, T, L2(Pn)) ≤1εd4 ΠF(n) for parameterized F , covering numbers, Rn(F )Consider,• F ⊆ [−1, 1]X• S = {x1, ...., xn} ⊆ X .• F|s= {(f(x1), ...., f(xn) | f ∈ F } ⊆ [−1, 1]n.• L2(ˆP ), ρ(u, v) =1nPi(ui− vi)21/2.Theorem 2.1.ˆRn(F ) ≤ infα>0s2 log(N(α, F, L2(ˆP ))n+ αProof. Fix α, α-coverˆF ofF .ˆRn(F ) = Eεsupf∈F1nnXi=1εif(xi)= E supˆf∈ˆFsupf∈F ∩Bα(ˆf)1nXεiˆf(xi) +1nXεi(f(xi) −ˆf(xi))| {z }h ε|{z}||ε||=1,f −ˆf|{z}k·k≤αiL2( ˆp)≤ E"supˆf∈ˆF1nXεiˆf(xi)+ α#Note:• F =Sˆf∈ˆF(F ∩ Bα(ˆf))• |ˆF | = N(α, F, L2(ˆP ))⇒ log N(α, F ) = d log(1/α) for the linear case.Set α =1√n, ⇒ Rn(F ) ≤q2d


View Full Document

Berkeley COMPSCI C281B - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?