Unformatted text preview:

Some Common Nonlinear Learning Methods Learning methods we ve seen so far Linear classifiers na ve Bayes logistic regression perceptron support vector machines Linear regression ridge regression lasso regularizers Nonlinear classifiers neural networks On beyond linearity Decision tree learning What decision trees are How to learn them How to prune them Rule learning Nearest neighbor learning DECISION TREE LEARNING OVERVIEW Decision tree learning A decision tree Another format a set of rules if O sunny and H 70 then PLAY else if O sunny and H 70 then DON T PLAY else if O overcast then PLAY else if O rain and windy then DON T PLAY One rule per leaf in the tree else if O rain and windy then PLAY Simpler rule set if O sunny and H 70 then DON T PLAY else if O rain and windy then DON T PLAY else PLAY A regression tree Play 33 Play 24 Play 48 Play 18 Play 45m 45 60 40 Play 37 Play 30m Play 5 Play 0m 0m Play 0 Play 32 Play 0m 0m Play 20m 30m Motivations for trees and rules Often you can find a fairly accurate classifier which is small and easy to understand Sometimes this gives you useful insight into a problem or helps you debug a feature set Sometimes features interact in complicated ways Trees can find interactions e g sunny and humid that linear classifiers can t Again sometimes this gives you some insight into the problem Trees are very inexpensive at test time You don t always even need to compute all the features of an example You can even build classifiers that take this into account Sometimes that s important e g bloodPressure 100 vs MRIScan normal might have different costs to compute An example using rules On the Onion data Translation if enlarge is in the set valued attribute wordsArticle then class fromOnion this rule is correct 173 times and never wrong if added is in the set valued attribute wordsArticle and play is in the set valued attribute wordsArticle then class fromOnion this rule is correct 6 times and wrong once After cleaning Enlarge Image lines Also 10 CV error rate increases from 1 4 to 6 Different Subcategories of Economist Articles Motivations for trees and rules Often you can find a fairly accurate classifier which is small and easy to understand Sometimes this gives you useful insight into a problem or helps you debug a feature set Sometimes features interact in complicated ways Trees can find interactions e g sunny and humid that linear classifiers can t Again sometimes this gives you some insight into the problem Trees are very inexpensive at test time You don t always even need to compute all the features of an example You can even build classifiers that take this into account Sometimes that s important e g bloodPressure 100 vs MRIScan normal might have different costs to compute DECISION TREE LEARNING THE ALGORITHM S Most decision tree learning algorithms 1 Given dataset D return leaf y if all examples are in the same class y or nearly so pick the best split on the best attribute a a c1 or a c2 or a or a a or not a a in c1 ck or not split the data into D1 D2 Dk and recursively build trees for each subset 2 Prune the tree Most decision tree learning algorithms 1 Given dataset D return leaf y if all examples are in the same class y or nearly so pick the best split on the best attribute a a c1 or a c2 or a or a a or not a a in c1 ck or not split the data into D1 D2 Dk and recursively build trees for each subset H D 2 Prune the tree Popular splitting criterion try to lower entropy of the y labels on the resulting partition i e prefer splits that contain fewer labels or that have very skewed distributions of labels Pr D k Y yk log PrD Y yk Most decision tree learning algorithms Pruning a tree avoid overfitting by removing subtrees somehow 15 13 2 8 Most decision tree learning algorithms 1 Given dataset D return leaf y if all examples are in the same class y or nearly so pick the best split on the best attribute a a or a a or not a a c1 or a c2 or a in c1 ck or not split the data into D1 D2 Dk and recursively build trees for each subset 2 Prune the tree 15 Same idea 13 DECISION TREE LEARNING BREAKING IT DOWN Breaking down decision tree learning First how to classify assume everything is binary function prPos classifyTree T x if T is a leaf node with counts n p prPos p 1 p n 2 Laplace smoothing else j T splitAttribute if x j 0 then prPos classifyTree T leftSon x else prPos classifyTree T rightSon x Breaking down decision tree learning Reduced error pruning with information gain Split the data D 2 3 1 3 into Dgrow and Dprune Build the tree recursively with Dgrow T growTree Dgrow Prune the tree with Dprune T pruneTree Dprune T Return T Breaking down decision tree learning First divide conquer to build the tree with Dgrow function T growTree X Y if X 10 or allOneLabel Y then T leafNode Y 0 Y 1 counts for n p else for i 1 n for each attribute i ai X i column i of X gain i infoGain Y Y ai 0 Y ai 1 j argmax gain the best attribute aj X j T splitNode growTree X aj 0 Y aj 0 left son growTree X aj 1 Y aj 1 right son argmax gain Breaking down decision tree learning function e entropy Y n Y p0 Y 0 n p1 Y 1 n e p1 log p1 p2 log p2 Breaking down decision tree learning First how to build the tree with Dgrow function g infoGain Y leftY rightY n Y nLeft leftY nRight rightY g entropy Y nLeft n entropy leftY nRight n entropy rightY function e entropy Y n Y p0 Y 0 n p1 Y 1 n e p1 log p1 p2 log p2 Breaking down decision tree learning Reduced error pruning with information gain Split the data D 2 3 1 3 into Dgrow and Dprune Build the tree recursively with Dgrow T growTree Dgrow Prune the tree with Dprune T pruneTree Dprune Return T Breaking down decision tree learning Next how to prune the tree with Dprune Estimate the error rate of every subtree on Dprune Recursively traverse the tree Reduce error on the left right subtrees of T If T would have lower error if it were converted to a leaf convert T to a leaf We re using the fact that the examples for sibling subtrees are disjoint A decision tree Breaking down decision tree learning To estimate error rates classify the whole pruning set and keep some counts function classifyPruneSet T X Y T pruneN Y 0 T pruneP Y 1 if T is not a leaf …


View Full Document

CMU MLG 10601 - decision-trees

Loading Unlocking...
Login

Join to view decision-trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view decision-trees and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?