CMU CS 10701 - Decision Trees, cont. Boosting - D479170

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Decision Trees, cont. Boosting

DOC PREVIEW

CMU CS 10701 - Decision Trees, cont. Boosting

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Decision Trees cont Boosting Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University October 1st 2007 1 Carlos Guestrin 2005 2007 A Decision Stump 2 Carlos Guestrin 2005 2007 The final tree 3 Carlos Guestrin 2005 2007 Basic Decision Tree Building Summarized BuildTree DataSet Output If all output values are the same in DataSet return a leaf node that says predict this unique output If all input values are the same return a leaf node that says predict the majority output Else find attribute X with highest Info Gain Suppose X has n X distinct values i e X has arity nX Create and return a non leaf node with nX children The i th child should be built by calling BuildTree DSi Output Where DSi built consists of all those records in DataSet for which X ith distinct value of X 4 Carlos Guestrin 2005 2007 MPG Test set error 5 Carlos Guestrin 2005 2007 MPG Test set error The test set error is much worse than the training set error why 6 Carlos Guestrin 2005 2007 Decision trees Learning Bias mpg good bad bad bad bad bad bad bad bad good bad good bad good good bad good bad cylinders displacement horsepower 4 6 4 8 6 4 4 8 8 8 8 4 6 4 4 8 4 5 low medium medium high medium low low high high high high low medium medium low high low medium low medium medium high medium medium medium high high medium high low medium low low high medium medium weight acceleration modelyear maker low medium medium high medium low low high high high high low medium low medium high low medium high medium low low medium medium low low low high low low high low high low medium medium 75to78 70to74 75to78 70to74 70to74 70to74 70to74 75to78 70to74 79to83 75to78 79to83 75to78 79to83 79to83 70to74 75to78 75to78 asia america europe america america asia asia america america america america america america america america america europe europe 7 Carlos Guestrin 2005 2007 Decision trees will overfit Standard decision trees are have no learning biased Training set error is always zero If there is no label noise Lots of variance Will definitely overfit Must bias towards simpler trees Many strategies for picking simpler trees Fixed depth Fixed number of leaves Or something smarter 8 Carlos Guestrin 2005 2007 Consider this split 9 Carlos Guestrin 2005 2007 A chi square test Suppose that mpg was completely uncorrelated with maker What is the chance we d have seen data of at least this apparent level of association anyway 10 Carlos Guestrin 2005 2007 A chi square test Suppose that mpg was completely uncorrelated with maker What is the chance we d have seen data of at least this apparent level of association anyway By using a particular kind of chi square test the answer is 7 2 Such simple hypothesis tests are very easy to compute unfortunately not enough time to cover in the lecture but in your homework you ll have fun 11 Carlos Guestrin 2005 2007 Using Chi squared to avoid overfitting Build the full decision tree as before But when you can grow it no more start to prune Beginning at the bottom of the tree delete splits in which pchance MaxPchance Continue working you way up until there are no more prunable nodes MaxPchance is a magic parameter you must specify to the decision tree indicating your willingness to risk fitting noise 12 Carlos Guestrin 2005 2007 Pruning example With MaxPchance 0 1 you will see the following MPG decision tree Note the improved test set accuracy compared with the unpruned tree 13 Carlos Guestrin 2005 2007 MaxPchance Technical note MaxPchance is a regularization parameter that helps us bias towards simpler models Expected Test set Error Decreasing MaxPchance High Bias Increasing High Variance We ll learn to choose the value of these magic parameters soon 14 Carlos Guestrin 2005 2007 Real Valued inputs What should we do if some of the inputs are real valued mpg good bad bad bad bad bad bad bad good bad good bad cylinders displacementhorsepower weight acceleration modelyear maker 4 6 4 8 6 4 4 8 97 199 121 350 198 108 113 302 4 8 4 5 75 90 110 175 95 94 95 139 120 455 107 131 2265 2648 2600 4100 3102 2379 2228 3570 79 225 86 103 18 2 15 12 8 13 16 5 16 5 14 12 8 2625 4425 2464 2830 77 70 77 73 74 73 71 78 18 6 10 15 5 15 9 82 70 76 78 asia america europe america america asia asia america america america europe europe Infinite number of possible split values Finite dataset only finite number of relevant splits Idea One Branch on each possible real value 15 Carlos Guestrin 2005 2007 One branch for each numeric value idea Hopeless with such high branching factor will shatter the dataset and overfit 16 Carlos Guestrin 2005 2007 Threshold splits Binary tree split on attribute X One branch X t Other branch X t 17 Carlos Guestrin 2005 2007 Choosing threshold split Binary tree split on attribute X Search through possible values of t One branch X t Other branch X t Seems hard But only finite number of t s are important Sort data according to X into x1 xm Consider split points of the form xi xi 1 xi 2 18 Carlos Guestrin 2005 2007 A better idea thresholded splits Suppose X is real valued Define IG Y X t as H Y H Y X t Define H Y X t H Y X t P X t H Y X t P X t IG Y X t is the information gain for predicting Y if all you know is whether X is greater than or less than t Then define IG Y X maxt IG Y X t For each real valued attribute use IG Y X for assessing its suitability as a split Note may split on an attribute multiple times with different thresholds 19 Carlos Guestrin 2005 2007 Example with MPG 20 Carlos Guestrin 2005 2007 Example tree using reals 21 Carlos Guestrin 2005 2007 What you need to know about decision trees Decision trees are one of the most popular data mining tools Easy to understand Easy to implement Easy to use Computationally cheap to solve heuristically Information gain to select attributes ID3 C4 5 Presented for classification can be used for regression and density estimation too Decision trees will overfit Zero bias classifier Lots of variance Must use tricks to find simple trees e g Fixed depth Early stopping Pruning Hypothesis testing 22 Carlos Guestrin 2005 2007 Acknowledgements Some of the material in the decision trees presentation is courtesy of Andrew Moore from his excellent collection of ML tutorials http www cs cmu edu awm tutorials 23 Carlos Guestrin 2005 2007 Announcements Homework 1 due Wednesday beginning of class started early started early started early started early started early started early started early started early Exam dates set Midterm Thursday Oct 25th 5

View Full Document

CMU CS 10701 - Decision Trees, cont. Boosting

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 18 pages.

CMU CS 10701 - Decision Trees, cont. Boosting

Sign up for free to view:

Please select your school