DOC PREVIEW
CMU CS 10701 - Boosting

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Boosting Can we make dumb learners smart Aarti Singh Machine Learning 10 701 15 781 Oct 11 2010 Slides Courtesy Carlos Guestrin Freund Schapire 1 Project Proposal Due Today 2 Why boost weak learners Goal Automatically categorize type of call requested Collect Calling card Person to person etc Easy to find rules of thumb that are often correct E g If card occurs in utterance then predict calling card Hard to find single highly accurate prediction rule 3 Fighting the bias variance tradeoff Simple a k a weak learners e g na ve Bayes logistic regression decision stumps or shallow decision trees Are good Low variance don t usually overfit Are bad High bias can t solve hard learning problems Can we make weak learners always good No But often yes 4 Voting Ensemble Methods Instead of learning a single weak classifier learn many weak classifiers that are good at different parts of the input space Output class Weighted vote of each classifier Classifiers that are most sure will vote with more conviction Classifiers will be most sure about a particular part of the space On average do better than single classifier H X Y 1 1 h1 X h2 X H X h1 X h2 X H X sign t ht X t 1 1 1 1 weights 5 Voting Ensemble Methods Instead of learning a single weak classifier learn many weak classifiers that are good at different parts of the input space Output class Weighted vote of each classifier Classifiers that are most sure will vote with more conviction Classifiers will be most sure about a particular part of the space On average do better than single classifier But how do you force classifiers ht to learn about different parts of the input space weigh the votes of different classifiers t 6 Boosting Schapire 89 Idea given a weak learner run it multiple times on reweighted training data then let learned classifiers vote On each iteration t weight each training example by how incorrectly it was classified Learn a weak hypothesis ht A strength for this hypothesis t Final classifier H X sign t ht X Practically useful Theoretically interesting 7 Learning from weighted data Consider a weighted dataset D i weight of i th training example xi yi Interpretations i th training example counts as D i examples If I were to resample data I would get more samples of heavier data points Now in all calculations whenever used i th training example counts as D i examples e g in MLE redefine Count Y y to be weighted count Unweighted data m Count Y y 1 Y i y i 1 Weights D i m Count Y y D i 1 Y i y i 1 8 AdaBoost Freund Schapire 95 Initially equal weights Na ve bayes decision stump weak weak Magic ve Increase weight if wrong on pt i yi ht xi 1 0 9 AdaBoost Freund Schapire 95 Initially equal weights Na ve bayes decision stump weak weak Magic ve Increase weight if wrong on pt i yi ht xi 1 0 Weights for all pts must sum to 1 Dt 1 i 1 t 10 AdaBoost Freund Schapire 95 Initially equal weights Na ve bayes decision stump weak weak Magic ve Increase weight if wrong on pt i yi ht xi 1 0 11 What t to choose for hypothesis ht Weight Update Rule Freund Schapire 95 Weighted training error Does ht get ith point wrong t 0 if ht perfectly classifies all weighted data pts t 1 if ht perfectly wrong ht perfectly right t 0 5 t t t 0 12 Boosting Example Decision Stumps 13 Boosting Example Decision Stumps 14 Analyzing training error Analysis reveals What t to choose for hypothesis ht t weighted training error If each weak learner ht is slightly better than random guessing t 0 5 then training error of AdaBoost decays exponentially fast in number of rounds T Training Error 15 Analyzing training error Training error of final classifier is bounded by Convex upper bound Where exp loss If boosting can make upper bound 0 then training error 0 0 1 loss 1 0 16 Analyzing training error Training error of final classifier is bounded by Where Proof Using Weight Update Rule Wts of all pts add to 1 17 Analyzing training error Training error of final classifier is bounded by Where If Zt 1 training error decreases exponentially even though weak learners may not be good t 0 5 Training error Upper bound t 18 What t to choose for hypothesis ht Training error of final classifier is bounded by Where If we minimize t Zt we minimize our training error We can tighten this bound greedily by choosing t and ht on each iteration to minimize Zt 19 What t to choose for hypothesis ht We can minimize this bound by choosing t on each iteration to minimize Zt For boolean target function this is accomplished by Freund Schapire 97 Proof 20 What t to choose for hypothesis ht We can minimize this bound by choosing t on each iteration to minimize Zt For boolean target function this is accomplished by Freund Schapire 97 Proof 21 Dumb classifiers made Smart Training error of final classifier is bounded by grows as t moves away from 1 2 If each classifier is at least slightly better than random t 0 5 AdaBoost will achieve zero training error exponentially fast in number of rounds T What about test error 22 Boosting results Digit recognition Schapire 1989 Test Error Training Error Boosting often but not always Robust to overfitting Test set error decreases even after training error is zero 23 Generalization Error Bounds Freund Schapire 95 bias variance large small T small small large T large tradeoff T number of boosting rounds d VC dimension of weak learner measures complexity of classifier m number of training examples 24 Generalization Error Bounds Freund Schapire 95 With high probability Boosting can overfit if T is large Boosting often Contradicts experimental results Robust to overfitting Test set error decreases even after training error is zero Need better analysis tools margin based bounds 25 Margin Based Bounds Schapire Freund Bartlett Lee 98 With high probability Boosting increases the margin very aggressively since it concentrates on the hardest examples If margin is large more weak learners agree and hence more rounds does not necessarily imply that final classifier is getting more complex Bound is independent of number of rounds T Boosting can still overfit if margin is too small weak learners are too complex or perform arbitrarily close to random guessing 26 Boosting Experimental Results Freund Schapire 1996 error Comparison of C4 5 decision trees vs Boosting decision stumps depth 1 trees C4 5 vs Boosting C4 5 27 benchmark datasets error error 27 Train Test Train Test Overfits Overfits Overfits Overfits 28 Boosting and Logistic Regression Logistic regression assumes And tries to


View Full Document

CMU CS 10701 - Boosting

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Boosting
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Boosting and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Boosting and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?