Some Common Nonlinear Learning Methods Learning methods we ve seen so far Linear classi iers na ve Bayes logistic regression perceptron support vector machines Linear regression ridge regression lasso regularizers Nonlinear classi iers neural networks On beyond linearity Decision tree learning What decision trees are How to learn them How to prune them Rule learning Nearest neighbor learning DECISION TREE LEARNING OVERVIEW Decision tree learning A decision tree Another format a set of rules if O sunny and H 70 then PLAY else if O sunny and H 70 then DON T PLAY else if O overcast then PLAY else if O rain and windy then DON T PLAY else if O rain and windy then PLAY One rule per leaf in the tree Simpler rule set if O sunny and H 70 then DON T PLAY else if O rain and windy then DON T PLAY else PLAY A regression tree Play 33 Play 24 Play 18 Play 48 Play 45m 45 60 40 Play 37 Play 5 Play 30m 45min Play 0m 0m 15m Play 0 Play 0m 0m Play 32 Play 20m 30m 45m Motivations for trees and rules Often you can ind a fairly accurate classi ier which is small and easy to understand Sometimes this gives you useful insight into a problem or helps you debug a feature set Sometimes features interact in complicated ways Trees can ind interactions e g sunny and humid that linear classi iers can t Again sometimes this gives you some insight into the problem Trees are very inexpensive at test time You don t always even need to compute all the features of an example You can even build classi iers that take this into account Sometimes that s important e g bloodPressure 100 vs MRIScan normal might have different costs to compute An example using rules On the Onion data Translation if enlarge is in the set valued attribute wordsArticle then class fromOnion this rule is correct 173 times and never wrong if added is in the set valued attribute wordsArticle and play is in the set valued attribute wordsArticle then class fromOnion this rule is correct 6 times and wrong once After cleaning Enlarge Image lines Also 10 CV error rate increases from 1 4 to 6 Different Subcategories of Economist Articles Motivations for trees and rules Often you can ind a fairly accurate classi ier which is small and easy to understand Sometimes this gives you useful insight into a problem or helps you debug a feature set Sometimes features interact in complicated ways Trees can ind interactions e g sunny and humid that linear classi iers can t Again sometimes this gives you some insight into the problem Trees are very inexpensive at test time You don t always even need to compute all the features of an example You can even build classi iers that take this into account Sometimes that s important e g bloodPressure 100 vs MRIScan normal might have different costs to compute DECISION TREE LEARNING THE ALGORITHM S Most decision tree learning algorithms 1 Given dataset D return leaf y if all examples are in the same class y or nearly so pick the best split on the best attribute a a c1 or a c2 or a or a a or not a a in c1 ck or not split the data into D1 D2 Dk and recursively build trees for each subset 2 Prune the tree Most decision tree learning algorithms 1 Given dataset D return leaf y if all examples are in the same class y or nearly so pick the best split on the best attribute a a c1 or a c2 or a or a a or not a a in c1 ck or not split the data into D1 D2 Dk and recursively build trees for each subset 2 Prune the tree H D Popular splitting criterion try to lower entropy of the y labels on the resulting partition i e prefer splits that contain fewer labels or that have very skewed distributions of labels Pr D k Y yk log PrD Y yk Most decision tree learning algorithms Pruning a tree avoid over itting by removing subtrees somehow 15 13 2 8 Most decision tree learning algorithms 1 Given dataset D return leaf y if all examples are in the same class y or nearly so pick the best split on the best attribute a a or a a or not a a c1 or a c2 or a in c1 ck or not split the data into D1 D2 Dk and recursively build trees for each subset 2 Prune the tree 15 Same idea 13 DECISION TREE LEARNING BREAKING IT DOWN Breaking down decision tree learning First how to classify assume everything is binary func on prPos classifyTree T x if T is a leaf node with counts n p prPos p 1 p n 2 Laplace smoothing else j T splitADribute if x j 0 then prPos classifyTree T leGSon x else prPos classifyTree T rightSon x Breaking down decision tree learning Reduced error pruning with information gain Split the data D 2 3 1 3 into Dgrow and Dprune Build the tree recursively with Dgrow T growTree Dgrow Prune the tree with Dprune T pruneTree Dprune T Return T Breaking down decision tree learning First divide conquer to build the tree with Dgrow func on T growTree X Y if X 10 or allOneLabel Y then T leafNode Y 0 Y 1 counts for n p else for i 1 n for each a ribute i ai X i column i of X gain i infoGain Y Y ai 0 Y ai 1 j argmax gain the best a ribute aj X j T splitNode growTree X aj 0 Y aj 0 leA son growTree X aj 1 Y aj 1 right son argmax gain Breaking down decision tree learning func on e entropy Y n Y p0 Y 0 n p1 Y 1 n e p1 log p1 p2 log p2 Breaking down decision tree learning First how to build the tree with Dgrow func on g infoGain Y leGY rightY n Y nLeG leGY nRight rightY g entropy Y nLeG n entropy leGY nRight n entropy rightY func on e entropy Y n Y p0 Y 0 n p1 Y 1 n e p1 log p1 p2 log p2 Breaking down decision tree learning Reduced error pruning with information gain Split the data D 2 3 1 3 into Dgrow and Dprune Build the tree recursively with Dgrow T growTree Dgrow Prune the tree with Dprune T pruneTree Dprune Return T Breaking down decision tree learning Next how to prune the tree with Dprune Estimate the error rate of every subtree on Dprune Recursively traverse the tree Reduce error on the left right subtrees of T If T would have lower error if it were converted to a leaf convert T to a leaf We re using the fact that the examples for sibling subtrees are disjoint A decision tree Breaking down decision tree learning To estimate error rates classify the whole pruning set and keep some counts …
View Full Document
Unlocking...