Decision Trees Greg Grudic Notes borrowed from Thomas G Dietterich and Tom Mitchell 1 Outline Decision Tree Representations ID3 and C4 5 learning algorithms Quinlan 1986 CART learning algorithm Breiman et al 1985 Entropy Information Gain Overfitting 2 1 Training Data Example When Does Greg Play Tennis 3 4 2 5 6 3 7 Learning Algorithm for Decision Trees S x1 y1 x N y N x x1 xd x j y 0 1 What happens if features are not binary What about regression 8 4 Choosing the Best Attribute 9 10 5 Entropy 11 Entropy 12 6 Information Gain 13 Training Example 14 7 Selecting the Next Attribute 15 16 8 Non Boolean Features Features with multiple discrete values Multi way splits Test for one value versus the rest Group values into disjoint sets Real valued features Use thresholds Regression Splits based on mean squared error metric 17 Hypothesis Space Search 18 9 Overfitting 19 Overfitting in Decision Trees 20 10 Validation Data is Used to Control Overfitting Prune tree to reduce error on validation set 21 11
View Full Document