DOC PREVIEW
CMU CS 10701 - Decision Trees, cont. Boosting

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

©Carlos Guestrin 2005-20071Decision Trees, cont.BoostingMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityOctober 1st, 2007©Carlos Guestrin 2005-20072A Decision Stump©Carlos Guestrin 2005-20073The final tree©Carlos Guestrin 2005-20074Basic Decision Tree BuildingSummarizedBuildTree(DataSet,Output) If all output values are the same in DataSet, return a leaf node that says“predict this unique output” If all input values are the same, return a leaf node that says “predict themajority output” Else find attribute X with highest Info Gain Suppose X has nX distinct values (i.e. X has arity nX). Create and return a non-leaf node with nX children. The i’th child should be built by callingBuildTree(DSi,Output)Where DSi built consists of all those records in DataSet for which X = ithdistinct value of X.©Carlos Guestrin 2005-20075MPG Testset error©Carlos Guestrin 2005-20076MPG Testset errorThe test set error is much worse than thetraining set error……why?©Carlos Guestrin 2005-20077Decision trees & Learning Biasmpg cylinders displacement horsepower weight acceleration modelyear makergood 4 low low low high 75to78 asiabad 6 medium medium medium medium 70to74 americabad 4 medium medium medium low 75to78 europebad 8 high high high low 70to74 americabad 6 medium medium medium medium 70to74 americabad 4 low medium low medium 70to74 asiabad 4 low medium low low 70to74 asiabad 8 high high high low 75to78 america: : : : : : : :: : : : : : : :: : : : : : : :bad 8 high high high low 70to74 americagood 8 high medium high high 79to83 americabad 8 high high high low 75to78 americagood 4 low low low low 79to83 americabad 6 medium medium medium high 75to78 americagood 4 medium low low low 79to83 americagood 4 low low medium high 79to83 americabad 8 high high high low 70to74 americagood 4 low medium low medium 75to78 europebad 5 medium medium medium medium 75to78 europe©Carlos Guestrin 2005-20078Decision trees will overfit Standard decision trees are have no learning biased Training set error is always zero! (If there is no label noise) Lots of variance Will definitely overfit!!! Must bias towards simpler trees Many strategies for picking simpler trees: Fixed depth Fixed number of leaves Or something smarter…©Carlos Guestrin 2005-20079Considerthis split©Carlos Guestrin 2005-200710A chi-square test Suppose that mpg was completely uncorrelated with maker. What is the chance we’d have seen data of at least this apparentlevel of association anyway?©Carlos Guestrin 2005-200711A chi-square test Suppose that mpg was completely uncorrelated with maker. What is the chance we’d have seen data of at least this apparent level ofassociation anyway?By using a particular kind of chi-square test, the answer is 7.2%(Such simple hypothesis tests are very easy to compute, unfortunately,not enough time to cover in the lecture,but in your homework, you’ll have fun! :))©Carlos Guestrin 2005-200712Using Chi-squared to avoid overfitting Build the full decision tree as before But when you can grow it no more, start toprune: Beginning at the bottom of the tree, delete splits inwhich pchance > MaxPchance Continue working you way up until there are no moreprunable nodesMaxPchance is a magic parameter you must specify to the decision tree,indicating your willingness to risk fitting noise©Carlos Guestrin 2005-200713Pruning example With MaxPchance = 0.1, you will see thefollowing MPG decision tree:Note the improvedtest set accuracycompared with theunpruned tree©Carlos Guestrin 2005-200714MaxPchance Technical note MaxPchance is a regularization parameter that helps usbias towards simpler modelsHigh Bias High VarianceMaxPchanceIncreasingDecreasingExpected Test setErrorWe’ll learn to choose the value of these magic parameters soon!©Carlos Guestrin 2005-200715Real-Valued inputs What should we do if some of the inputs are real-valued?mpg cylindersdisplacementhorsepower weight acceleration modelyear makergood 4 97 75 2265 18.2 77 asiabad 6 199 90 2648 15 70 americ abad 4 121 110 2600 12.8 77 europebad 8 350 175 4100 13 73 americabad 6 198 95 3102 16.5 74 americabad 4 108 94 2379 16.5 73 asiabad 4 113 95 2228 14 71 asiabad 8 302 139 3570 12.8 78 america: : : : : : : :: : : : : : : :: : : : : : : :good 4 120 79 2625 18.6 82 americabad 8 455 225 4425 10 70 americagood 4 107 86 2464 15.5 76 europebad 5 131 103 2830 15.9 78 europeInfinite number of possible split values!!!Finite dataset, only finite number of relevant splits!Idea One: Branch on each possible real value©Carlos Guestrin 2005-200716“One branch for each numericvalue” idea:Hopeless: with such high branching factor will shatterthe dataset and overfit©Carlos Guestrin 2005-200717Threshold splits Binary tree, split on attribute X One branch: X < t Other branch: X ¸ t©Carlos Guestrin 2005-200718Choosing threshold split Binary tree, split on attribute X One branch: X < t Other branch: X ¸ t Search through possible values of t Seems hard!!! But only finite number of t’s are important Sort data according to X into {x1,…,xm} Consider split points of the form xi + (xi+1 – xi)/2©Carlos Guestrin 2005-200719A better idea: thresholded splits Suppose X is real valued Define IG(Y|X:t) as H(Y) - H(Y|X:t) Define H(Y|X:t) =H(Y|X < t) P(X < t) + H(Y|X >= t) P(X >= t) IG(Y|X:t) is the information gain for predicting Y if all youknow is whether X is greater than or less than t Then define IG*(Y|X) = maxt IG(Y|X:t) For each real-valued attribute, use IG*(Y|X) forassessing its suitability as a split Note, may split on an attribute multiple times,with different thresholds©Carlos Guestrin 2005-200720Example with MPG©Carlos Guestrin 2005-200721Example tree using reals©Carlos Guestrin 2005-200722What you need to know aboutdecision trees Decision trees are one of the most popular data mining tools Easy to understand Easy to implement Easy to use Computationally cheap (to solve heuristically) Information gain to select attributes (ID3, C4.5,…) Presented for classification, can be used for regression anddensity estimation too Decision trees will overfit!!! Zero bias classifier ! Lots of variance Must use tricks to find “simple trees”, e.g., Fixed depth/Early stopping Pruning Hypothesis testing©Carlos Guestrin 2005-200723Acknowledgements Some of the material


View Full Document

CMU CS 10701 - Decision Trees, cont. Boosting

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Decision Trees, cont. Boosting
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Decision Trees, cont. Boosting and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Decision Trees, cont. Boosting 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?