DOC PREVIEW
CMU CS 10701 - learning-theory-mid-review

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

More details General http www learning with kernels org Example of more complex bounds http www research ibm com people t tzhang papers jmlr02 cover ps gz PAC learning VC Dimension and Margin based Bounds Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University March 6th 2006 2006 Carlos Guestrin 1 Announcements 1 Midterm on Wednesday open book texts notes no laptops bring a calculator 2006 Carlos Guestrin 2 Announcements 2 Final project details are out http www cs cmu edu guestrin Class 10701 projects html Great opportunity to apply ideas from class and learn more Example project Take a dataset Define learning task Apply learning algorithms Design your own extension Evaluate your ideas many of suggestions on the webpage but you can also do your own Boring stuff Individually or groups of two students It s worth 20 of your final grade You need to submit a one page proposal on Wed 3 22 just after the break A 5 page initial write up milestone is due on 4 12 20 of project grade An 8 page final write up due 5 8 60 of the grade A poster session for all students will be held on Friday 5 5 2 5pm in NSH atrium 20 of the grade You can use late days on write ups each student in team will be charged a late day per day MOST IMPORTANT 2006 Carlos Guestrin 3 What now We have explored many ways of learning from data But How good is our classifier really How much data do I need to make it good enough 2006 Carlos Guestrin 4 How likely is learner to pick a bad hypothesis Prob h with errortrue h gets m data points right There are k hypothesis consistent with data How likely is learner to pick a bad one 2006 Carlos Guestrin 5 Union bound P A or B or C or D or 2006 Carlos Guestrin 6 How likely is learner to pick a bad hypothesis Prob h with errortrue h gets m data points right There are k hypothesis consistent with data How likely is learner to pick a bad one 2006 Carlos Guestrin 7 Review Generalization error in finite hypothesis spaces Haussler 88 Theorem Hypothesis space H finite dataset D with m i i d samples 0 1 for any learned hypothesis h that is consistent on the training data 2006 Carlos Guestrin 8 Using a PAC bound Typically 2 use cases 1 Pick and give you m 2 Pick m and give you 2006 Carlos Guestrin 9 Review Generalization error in finite hypothesis spaces Haussler 88 Theorem Hypothesis space H finite dataset D with m i i d samples 0 1 for any learned hypothesis h that is consistent on the training data Even if h makes zero errors in training data may make errors in test 2006 Carlos Guestrin 10 Limitations of Haussler 88 bound Consistent classifier Size of hypothesis space 2006 Carlos Guestrin 11 Simpler question What s the expected error of a hypothesis The error of a hypothesis is like estimating the parameter of a coin Chernoff bound for m i d d coin flips x1 xm where xi 0 1 For 0 1 2006 Carlos Guestrin 12 But we are comparing many hypothesis Union bound For each hypothesis hi What if I am comparing two hypothesis h1 and h2 2006 Carlos Guestrin 13 Generalization bound for H hypothesis Theorem Hypothesis space H finite dataset D with m i i d samples 0 1 for any learned hypothesis h 2006 Carlos Guestrin 14 PAC bound and Bias Variance tradeoff or after moving some terms around with probability at least 1 Important PAC bound holds for all h but doesn t guarantee 2006 that algorithm finds best h 15 Carlos Guestrin What about the size of the hypothesis space How large is the hypothesis space 2006 Carlos Guestrin 16 Boolean formulas with n binary features 2006 Carlos Guestrin 17 Number of decision trees of depth k Recursive solution Given n attributes Hk Number of decision trees of depth k H0 2 Hk 1 choices of root attribute possible left subtrees possible right subtrees n Hk Hk Write Lk log2 Hk L0 1 Lk 1 log2 n 2Lk So Lk 2k 1 1 log2 n 1 2006 Carlos Guestrin 18 PAC bound for decision trees of depth k Bad Number of points is exponential in depth But for m data points decision tree can t get too big Number of leaves never 2006 more than number data points 19 Carlos Guestrin Number of decision trees with k leaves Hk Number of decision trees with k leaves H0 2 Reminder Loose bound 2006 Carlos Guestrin 20 PAC bound for decision trees with k leaves Bias Variance revisited 2006 Carlos Guestrin 21 What did we learn from decision trees Bias Variance tradeoff formalized Moral of the story Complexity of learning not measured in terms of size hypothesis space but in maximum number of points that allows consistent classification Complexity m no bias lots of variance Lower than m some bias less variance 2006 Carlos Guestrin 22 What about continuous hypothesis spaces Continuous hypothesis space H Infinite variance As with decision trees only care about the maximum number of points that can be classified exactly 2006 Carlos Guestrin 23 How many points can a linear boundary classify exactly 1 D 2006 Carlos Guestrin 24 How many points can a linear boundary classify exactly 2 D 2006 Carlos Guestrin 25 How many points can a linear boundary classify exactly d D 2006 Carlos Guestrin 26 Shattering a set of points 2006 Carlos Guestrin 27 VC dimension 2006 Carlos Guestrin 28 PAC bound using VC dimension Number of training points that can be classified exactly is VC dimension Measures relevant size of hypothesis space as with decision trees with k leaves Bound for infinite dimension hypothesis spaces 2006 Carlos Guestrin 29 Examples of VC dimension Linear classifiers VC H d 1 for d features plus constant term b Neural networks VC H parameters Local minima means NNs will probably not find best parameters 1 Nearest neighbor 2006 Carlos Guestrin 30 Another VC dim example What s the VC dim of decision stumps in 2d 2006 Carlos Guestrin 31 PAC bound for SVMs SVMs use a linear classifier For d features VC H d 1 2006 Carlos Guestrin 32 VC dimension and SVMs Problems Doesn t take margin into account What about kernels Polynomials num features grows really fast Bad bound n input features p degree of polynomial Gaussian kernels can classify any set of points exactly 2006 Carlos Guestrin 33 Margin based VC dimension H Class of linear classifiers w x b 0 Canonical form minj w xj 1 VC H R2 w w Doesn t depend on number of features R2 maxj xj xj magnitude of data R2 is bounded even for Gaussian kernels bounded VC dimension Large margin low w w low VC dimension Very cool 2006 Carlos Guestrin 34 Applying margin VC to SVMs VC H R2 w w R2 maxj xj xj magnitude of data doesn t depend on choice of w SVMs minimize w w SVMs


View Full Document

CMU CS 10701 - learning-theory-mid-review

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download learning-theory-mid-review
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view learning-theory-mid-review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view learning-theory-mid-review and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?