CMU CS 10601 - Boosting - D1380222

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> Boosting

DOC PREVIEW

CMU CS 10601 - Boosting

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

11/9/20091Machine Learning - 10601BoostingGeoff Gordon, MiroslavDudík([[[partly based on slides of Rob Schapire and Carlos Guestrin]http://www.cs.cmu.edu/~ggordon/10601/November 9, 2009 Ensembles of treesBAGGING andRANDOM FORESTS• learn many big trees• each tree aims to fitthe same target concept– random training sets– randomized tree growth• voting ≈ averaging:DECREASE in VARIANCEBOOSTING• learn many small trees(weak classifiers)• each tree ‘specializes’ to a different part of target concept– reweight training examples– higher weights where still errors• voting increases expressivity:DECREASE in BIAS11/9/20092Boosting• boosting = general method of convertingrough rules of thumb (e.g., decision stumps)into highly accurate prediction ruleBoosting• boosting = general method of convertingrough rules of thumb (e.g., decision stumps)into highly accurate prediction rule• technically:– assume given “weak” learning algorithm that can consistently ﬁnd classiﬁers (“rules of thumb”) at least slightly better than random, say, accuracy ≥ 55%(in two-class setting)– given suﬃcient data, a boosting algorithm can provably construct single classiﬁer with very high accuracy, say, 99%11/9/20093AdaBoost[Freund-Schapire 1995]11/9/20094weak classifiers = decision stumps(vertical or horizontal half-planes)11/9/2009511/9/20096A typical run of AdaBoost• training error rapidly drops(combining weak learners increases expressivity)• test error does not increase with number of trees T(robustness to overfitting)11/9/2009711/9/20098Bounding true error[Freund-Schapire 1997]• T = number of rounds• d = VC dimension of weak learner• m = number of training examples11/9/20099Bounding true error (a first guess)A typical runcontradicts a naïve bound11/9/200910Finer analysis: margins[Schapire et al. 1998]Empirical evidence: margin distribution11/9/200911Theoretical evidence:large margins  simple classifiersMore technically…Bound depends on:• d = VC dimension of weak learner• m = number of training examples• entire distribution of training marginsPreviously11/9/200912Practical advantages of AdaBoostApplication: detecting faces[Viola-Jones 2001]11/9/200913Caveats“Hard” predictionscan slow downlearning!11/9/200914Confidence-rated Predictions[Schapire-Singer 1999]Confidence-rated Predictions11/9/200915Confidence-ratedpredictionshelp a lot!Loss in logistic regression11/9/200916Loss in AdaBoostLogistic regression vs AdaBoost11/9/200917Benefits of model-fitting viewWhat you should know about boosting• weak classifiers  strong classifiers– weak: slightly better than random on training data– strong: eventually zero error on training data• AdaBoost prevents overfitting by increasing margins• regimes when AdaBoost overfits– weak learner too strong: use small trees or stop early– data noisy: stop early• AdaBoost vs Logistic Regression– exponential loss vs log loss– single-coordinate updates vs full

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 17 pages.

CMU CS 10601 - Boosting

Sign up for free to view:

Please select your school