DOC PREVIEW
UT Dallas CS 6375 - vcdimCMU

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Nov 20th, 2001Copyright © 2001, Andrew W. MooreVC-dimension for characterizing classifiers Andrew W. MooreAssociate ProfessorSchool of Computer ScienceCarnegie Mellon Universitywww.cs.cmu.edu/[email protected] to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials. Comments and corrections gratefully received. Copyright © 2001, Andrew W. Moore VC-dimension: Slide 2A learning machine• A learning machine f takes an input x and transforms it, somehow using weights α, into a predicted output yest= +/- 1f xαyestα is some vector of adjustable parameters2Copyright © 2001, Andrew W. Moore VC-dimension: Slide 3Examplesf xαyestf(x,b) = sign(x.x – b)denotes +1denotes -1Copyright © 2001, Andrew W. Moore VC-dimension: Slide 4Examplesf xαyestdenotes +1denotes -1f(x,w) = sign(x.w)3Copyright © 2001, Andrew W. Moore VC-dimension: Slide 5Examplesf xαyestf(x,w,b) = sign(x.w+b)denotes +1denotes -1Copyright © 2001, Andrew W. Moore VC-dimension: Slide 6How do we characterize “power”?• Different machines have different amounts of “power”.• Tradeoff between:• More power: Can model more complex classifiers but might overfit.• Less power: Not going to overfit, but restricted in what it can model.• How do we characterize the amount of power?4Copyright © 2001, Andrew W. Moore VC-dimension: Slide 7Some definitions• Given some machine f• And under the assumption that all training points (xk,yk) were drawn i.i.d from some distribution.• And under the assumption that future test points will be drawn from the same distribution• DefineicationMisclassifofy Probabilit),(21)(TESTERR)( =−== ααα xfyEROfficial terminologyTerminology we’ll useCopyright © 2001, Andrew W. Moore VC-dimension: Slide 8Some definitions• Given some machine f• And under the assumption that all training points (xk,yk) were drawn i.i.d from some distribution.• And under the assumption that future test points will be drawn from the same distribution• DefineicationMisclassifofy Probabilit),(21)(TESTERR)( =−== ααα xfyEROfficial terminologyTerminology we’ll useiedmisclassifSet TrainingFraction ),(211)(TRAINERR)(1=−==∑=RkkkempxfyRR αααR = #training set data points5Copyright © 2001, Andrew W. Moore VC-dimension: Slide 9Vapnik-Chervonenkis dimension• Given some machine f, let h be its VC dimension.• h is a measure of f’s power (h does not depend on the choice of training set)• Vapnik showed that with probability 1-η−= ),(21)(TESTERR αα xfyE∑=−=RkkkxfyR1),(211)(TRAINERR ααRhRh )4/log()1)/2(log()(TRAINERR)(TESTERRηαα−++≤This gives us a way to estimate the error on future data based only on the training error and the VC-dimension of fCopyright © 2001, Andrew W. Moore VC-dimension: Slide 10What VC-dimension is used for• Given some machine f, let h be its VC dimension.• h is a measure of f’s power.• Vapnik showed that with probability 1-η−= ),(21)(TESTERR αα xfyE∑=−=RkkkxfyR1),(211)(TRAINERR ααRhRh )4/log()1)/2(log()(TRAINERR)(TESTERRηαα−++≤This gives us a way to estimate the error on future data based only on the training error and the VC-dimension of fBut given machine f, how do we define and compute h?6Copyright © 2001, Andrew W. Moore VC-dimension: Slide 11Shattering• Machine f can shatter a set of points x1, x2.. xrif and only if…For every possible training set of the form (x1,y1) , (x2,y2) ,… (xr ,yr)…There exists some value of α that gets zero training error.There are 2rsuch training sets to consider, each with a different combination of +1’s and –1’s for the y’sCopyright © 2001, Andrew W. Moore VC-dimension: Slide 12Shattering• Machine f can shatter a set of points x1, x2.. Xrif and only if…For every possible training set of the form (x1,y1) , (x2,y2) ,… (xr ,yr)…There exists some value of α that gets zero training error.• Question: Can the following f shatter the following points?f(x,w) = sign(x.w)7Copyright © 2001, Andrew W. Moore VC-dimension: Slide 13Shattering• Machine f can shatter a set of points x1, x2.. Xrif and only if…For every possible training set of the form (x1,y1) , (x2,y2) ,… (xr ,yr)…There exists some value of α that gets zero training error.• Question: Can the following f shatter the following points?f(x,w) = sign(x.w)• Answer: No problem. There are four training sets to considerw=(0,1) w=(0,-1)w=(2,-3)w=(-2,3)Copyright © 2001, Andrew W. Moore VC-dimension: Slide 14Shattering• Machine f can shatter a set of points x1, x2.. Xrif and only if…For every possible training set of the form (x1,y1) , (x2,y2) ,… (xr ,yr)…There exists some value of α that gets zero training error.• Question: Can the following f shatter the following points?f(x,b) = sign(x.x-b)8Copyright © 2001, Andrew W. Moore VC-dimension: Slide 15Shattering• Machine f can shatter a set of points x1, x2.. Xrif and only if…For every possible training set of the form (x1,y1) , (x2,y2) ,… (xr ,yr)…There exists some value of α that gets zero training error.• Question: Can the following f shatter the following points?f(x,b) = sign(x.x-b)• Answer: No way my friend. Copyright © 2001, Andrew W. Moore VC-dimension: Slide 16Definition of VC dimensionGiven machine f, the VC-dimension h isThe maximum number of points that can be arranged so that f shatter them. Example: What’s VC dimension of f(x,b) = sign(x.x-b)9Copyright © 2001, Andrew W. Moore VC-dimension: Slide 17VC dim of trivial circleGiven machine f, the VC-dimension h isThe maximum number of points that can be arranged so that f shatter them. Example: What’s VC dimension of f(x,b) = sign(x.x-b)Answer = 1: we can’t even shatter two points! (but it’s clear we can shatter 1)Copyright © 2001, Andrew W. Moore VC-dimension: Slide 18Reformulated circleGiven machine f, the VC-dimension h isThe maximum number of points that can be arranged so that f shatter them. Example: For 2-d inputs, what’s VC dimension of f(x,q,b) =


View Full Document

UT Dallas CS 6375 - vcdimCMU

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download vcdimCMU
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view vcdimCMU and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view vcdimCMU 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?