DOC PREVIEW
CMU CS 10601 - lecture

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11/9/20091Machine Learning - 10601BoostingGeoff Gordon, MiroslavDudík([[[partly based on slides of Rob Schapire and Carlos Guestrin]http://www.cs.cmu.edu/~ggordon/10601/November 9, 2009 Ensembles of treesBAGGING andRANDOM FORESTS• learn many big trees• each tree aims to fitthe same target concept– random training sets– randomized tree growth• voting ≈ averaging:DECREASE in VARIANCE11/9/20092Ensembles of treesBAGGING andRANDOM FORESTS• learn many big trees• each tree aims to fitthe same target concept– random training sets– randomized tree growth• voting ≈ averaging:DECREASE in VARIANCEBOOSTING• learn many small trees(weak classifiers)• each tree ‘specializes’ to a different part of target concept– reweight training examples– higher weights where still errors• voting increases expressivity:DECREASE in BIASBoosting• boosting = general method of convertingrough rules of thumb (e.g., decision stumps)into highly accurate prediction rule11/9/20093Boosting• boosting = general method of convertingrough rules of thumb (e.g., decision stumps)into highly accurate prediction rule• technically:– assume given “weak” learning algorithm that can consistently find classifiers (“rules of thumb”) at least slightly better than random, say, accuracy ≥ 55%(in two-class setting)– given sufficient data, a boosting algorithm can provably construct single classifier with very high accuracy, say, 99%11/9/2009411/9/20095AdaBoost[Freund-Schapire 1995]11/9/20096AdaBoost[Freund-Schapire 1995]AdaBoost[Freund-Schapire 1995]11/9/20097AdaBoost[Freund-Schapire 1995]AdaBoost[Freund-Schapire 1995]11/9/20098weak classifiers = decision stumps(vertical or horizontal half-planes)11/9/2009911/9/200910A typical run of AdaBoost• training error rapidly drops(combining weak learners increases expressivity)• test error does not increase with number of trees T(robustness to overfitting)11/9/20091111/9/20091211/9/20091311/9/20091411/9/20091511/9/200916Bounding true error[Freund-Schapire 1997]• T = number of rounds• d = VC dimension of weak learner• m = number of training examples11/9/200917Bounding true error (a first guess)A typical run“contradicts” a naïve bound11/9/200918Finer analysis: margins[Schapire et al. 1998]Finer analysis: margins[Schapire et al. 1998]11/9/200919Finer analysis: margins[Schapire et al. 1998]Finer analysis: margins[Schapire et al. 1998]11/9/200920Empirical evidence: margin distributionEmpirical evidence: margin distribution11/9/200921Theoretical evidence:large margins  simple classifiersTheoretical evidence:large margins  simple classifiers11/9/200922Theoretical evidence:large margins  simple classifiersTheoretical evidence:large margins  simple classifiers11/9/200923More technically…New generalization bound:Bound depends on:• d = VC dim of weak learner• m = #training examples• entire distribution• of training marginsMore technically…PreviouslyNew generalization bound:11/9/200924Practical advantages of AdaBoostApplication: detecting faces[Viola-Jones 2001]11/9/200925Caveats11/9/20092611/9/200927“Hard” predictionscan slow downlearning!Confidence-rated Predictions[Schapire-Singer 1999]11/9/200928Confidence-rated Predictions[Schapire-Singer 1999]Confidence-rated Predictions[Schapire-Singer 1999]11/9/200929Confidence-rated PredictionsConfidence-rated Predictions11/9/200930Confidence-rated PredictionsConfidence-ratedpredictionshelp a lot!11/9/200931Loss in logistic regressionLoss in logistic regression11/9/200932Loss in logistic regressionLoss in AdaBoost11/9/200933Loss in AdaBoostLogistic regression vs AdaBoost11/9/200934Logistic regression vs AdaBoostLogistic regression vs AdaBoost11/9/200935Benefits of model-fitting viewBenefits of model-fitting view11/9/200936Benefits of model-fitting viewWhat you should know about boosting• weak classifiers  strong classifiers– weak: slightly better than random on training data– strong: eventually zero error on training data• AdaBoost prevents overfitting by increasing margins• regimes when AdaBoost overfits– weak learner too strong: use smaller trees or stop early– data noisy: stop early or regularize αt• AdaBoost vs Logistic Regression– exponential loss vs log loss– single-coordinate updates vs full


View Full Document

CMU CS 10601 - lecture

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?