DOC PREVIEW
UT Dallas CS 6375 - IISC_VC dimension

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

E0 370 Statistical Learning Theory Lecture 4 (Aug 23, 2011)VC-Dimension and Sauer’s LemmaLecturer: Shivani Agarwal Scribe: Achintya Kundu1 IntroductionIn the previous lecture we saw the technique of uniform convergence for obtaining confidence bounds on thegeneralization error erD[hS] when the function hSis selected from a function class H of sufficiently limited‘capacity’. Specifically, we saw that if hSis selected from H ⊆ {−1, 1}X, then for any 0 < δ < 1, withprobability at least 1 − δ (over the draw of S ∼ Dm),erD[hS] ≤ erS[hS] +s8ln ΠH(2m) + ln(4δ)m. (1)In order for this result to be meaningful (i.e. to give a non-trivial bound), the growth function ΠH(2m) needsto grow ‘slowly’ in m; in particular, it needs to have sub-exponential growth in m.In this lecture we meet the VC-dimension, a fundamental combinatorial parameter associated with a classof binary-valued functions H. We will see that if H has finite VC-dimension, then the growth function of Hgrows polynomially in m, yielding a meaningful uniform convergence result and generalization error bound.2 Vapnik-Chervonenkis DimensionDefinition. Let H ⊆ {−1, 1}Xand let xm1= (x1, . . . , xm) ∈ Xm. We say xm1is shattered by H ifH|xm1=2m; i.e. if ∀b ∈ {−1, 1}m, ∃hb∈ H s.t. (hb(x1), . . . , hb(xm)) = (b1, . . . , bm). The Vapnik-Chervonenkis(VC) dimension of H, denoted by VCdim(H), is the cardinality of the largest set of points in X that can beshattered by H:VCdim(H) = maxnm ∈ NΠH(m) = 2mo.If H shatters arbitrarily large sets of points in X , then VCdim(H) = ∞.Example 1 (Intervals on the real line). Let X = R, and let H be the class of all binary-valued functionson R that label all points within some closed interval as +1 and all points outside the interval as −1:H =nh : X → {−1, 1}h(x) = 1 if x ∈ [a, b] and −1 otherwise, for some a ≤ boClearly, any set of 2 points x1< x2in R can be shattered by H: consider the functions in H correspondingto the intervals [x1− 2, x1− 1], [x1− 1,x1+x22], [x1+x22, x2+ 1], and [x1− 1, x2+ 1]; these functions realize allpossible binary labelings of the 2 points. Moreover, no set of 3 points x1< x2< x3in R can be shatteredby H: no function in H can label x2as negative and x1, x3as positive. Therefore VCdim(H) = 2.Example 2 (Axis-parallel rectangles in the plane). Let X = R2, and let H be the class of all binary-valuedfunctions on X that label all points within some axis-parallel rectangle as +1 and all points outside therectangle as −1:H =nh : X → {−1, 1}h(x) = 1 if x ∈ [a, b] × [c, d] and −1 otherwise, for some a ≤ b, c ≤ doFigure 1 shows a set of 4 points in R2that are shattered by H; therefore VCdim(H) ≥ 4 (note that there arealso sets of 4 points that are not shattered by H, such as any set that includes 3 points on a line; however12 VC-Dimension and Sauer’s Lemmathis does not matter: to show VCdim(H) ≥ d, we just need some set of d points to be shattered by H).Moreover, it can be verified that no set of 5 points in R2can be shattered by H (in any set of 5 points, thereis at least one point which is neither the sole extreme left/right nor the sole extreme top/bottom; there isno function in H that can label this point negative and all others positive). Therefore VCdim(H) = 4................................................................ .Figure 1: Four points in R2that can be shattered using axis-parallel rectangles.Example 3 (Linear classifiers in Rn). Let X = Rn, and let H be the class of linear classifiers in Rn:H =nh : X → {−1, 1}h(x) = sign(w · x + b) for some w ∈ Rn, b ∈ Ro.Consider first the case n = 2. Figure 2 shows a set of 3 points in R2that are shattered by H. Moreover, itis easy to verify that no set of 4 points in R2can be shattered by H. Therefore VCdim(H) in this case is 3.In general, for linear classifiers in Rn, VCdim(H) = n + 1........... ............. .Figure 2: Three points in R2that can be shattered using linear classifiers.Example 4 (Finite unions of intervals on the real line). Let X = R, and let H be the class of all functionson R that label all points within some finite union of closed intervals as 1 and all other points as −1:H =nh : X → {−1, 1}h(x) = 1 if x ∈Ski=1[ai, bi] and −1 otherwise, for some k ∈ N, ai≤ bi∀i ∈ [k]o.Then VCdim(H) = ∞.3 Sauer’s LemmaTheorem 3.1 (Sauer’s Lemma; Sauer, 1972; Shelah, 1972; Vapnik & Chervonenkis, 1971). Let H ⊆{−1, 1}Xwith VCdim(H) = d < ∞. Then for all m ∈ N,ΠH(m) ≤dXi=0mi.Proof. By induction on both m and d.VC-Dimension and Sauer’s Lemma 3Base case. There are two base cases to consider:(i) d = 0, m ≥ 1. In this case H can contain only a single function, and we have ΠH(m) = 1; moreover,Pdi=0mi=m0= 1. Therefore the hypothesis holds.(ii) m = 1, d ≥ 1. In this case ΠH(m) = 2; moreover,Pdi=0mi=10+11= 2. Therefore the hypothesisholds in this case as well.Induction step. Let m > 1, d > 0. Assume the hypothesis for all (m0, d0) such that m0< m or d0< d (inparticular, we will need only the cases (m − 1, d − 1) and (m − 1, d)). We will show the hypothesis is truefor (m, d).Let xm1= (x1, . . . , xm) ∈ Xm. ConsiderH|xm1=n(h(x1), . . . , h(xm))h ∈ HoandH|xm−11=n(h(x1), . . . , h(xm−1))h ∈ Ho.All sequences in H|xm−11appear either once or twice in H|xm1(followed by either 1 or −1 in the m-th position,or both). Let H3be the set of all sequences in H|xm−11that appear twice in H|xm1:H3=n(y1, . . . , ym−1) ∈ H|xm−11∃h, h0∈ H s.t. h(xi) = h0(xi) = yi∀i ∈ [m − 1] and h(xm) 6= h0(xm)o.Then clearlyH|xm1=H|xm−11+ |H3| . (2)Now H|xm−11is a restriction to m − 1 points of functions from the class H of VC-dimension d. Moreover,H3can be viewed as a restriction to m − 1 points of functions from a class of VC-dimension at most d − 1(to see this, note that to any subset of the m − 1 points x1, . . . , xm−1that is shattered in H3, we can addxmto obtain a strictly larger set of points that is shattered by H; the claim follows since VCdim(H) = d).Applying the inductive hypothesis then givesH|xm1≤dXi=0m − 1i+d−1Xi=0m − 1i(3)=dXi=0m − 1i+m − 1i − 1(wherem−1−1= 0) (4)=dXi=0mi. (5)Since xm1∈ Xmwas arbitrary, the result follows.Corollary 3.2. Let H ⊆ {−1,


View Full Document

UT Dallas CS 6375 - IISC_VC dimension

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download IISC_VC dimension
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view IISC_VC dimension and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view IISC_VC dimension 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?