DOC PREVIEW
CMU CS 10701 - Lecture9

This preview shows page 1-2-3-18-19-36-37-38 out of 38 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Decision Trees Aarti Singh Machine Learning 10 701 15 781 Oct 6 2010 Learning a good prediction rule Learn a mapping Best prediction rule Hypothesis space Function class Parametric classes Gaussian binomial etc Conditionally independent class densities Na ve Bayes Linear decision boundary Logistic regression Nonparametric class Histograms nearest neighbor kernel estimators Decision Trees Today Given training data find a hypothesis function in close to the best prediction rule that is 2 First What does a decision tree represent Given a decision tree how do we assign label to a test point 3 Decision Tree for Tax Fraud Detection Query Data Refund Yes No NO TaxInc 80K NO Married NO 80K YES Taxable Income Cheat No 80K Married 10 MarSt Single Divorced Refund Marital Status Each internal node test one feature Xi Each branch from a node selects one value for Xi Each leaf node predict Y 4 Decision Tree for Tax Fraud Detection Query Data Refund Yes No NO Refund Marital Status Taxable Income Cheat No 80K Married 10 MarSt Single Divorced TaxInc 80K NO Married NO 80K YES 5 Decision Tree for Tax Fraud Detection Query Data Refund Yes No NO Refund Marital Status Taxable Income Cheat No 80K Married 10 MarSt Single Divorced TaxInc 80K NO Married NO 80K YES 6 Decision Tree for Tax Fraud Detection Query Data Refund Yes No NO Refund Marital Status Taxable Income Cheat No 80K Married 10 10 MarSt Single Divorced TaxInc 80K NO Married NO 80K YES 7 Decision Tree for Tax Fraud Detection Query Data Refund Yes No NO Refund Marital Status Taxable Income Cheat No 80K Married 10 10 MarSt Single Divorced TaxInc 80K NO Married NO 80K YES 8 Decision Tree for Tax Fraud Detection Query Data Refund Yes No NO Refund Marital Status Taxable Income Cheat No 80K Married 10 10 MarSt Single Divorced TaxInc 80K NO Married NO 80K YES 9 Decision Tree for Tax Fraud Detection Query Data Refund Yes No NO Refund Marital Status Taxable Income Cheat No 80K Married 10 10 MarSt Single Divorced TaxInc 80K NO Married Assign Cheat to No NO 80K YES 10 Decision Tree more generally 1 1 0 0 1 1 1 0 1 1 0 1 Features can be discrete continuous or categorical Each internal node test some set of features Xi Each branch from a node selects a set of value for Xi Each leaf node predict Y 1 1 11 So far What does a decision tree represent Given a decision tree how do we assign label to a test point Now How do we learn a decision tree from training data What is the decision on each leaf 12 So far What does a decision tree represent Given a decision tree how do we assign label to a test point Now How do we learn a decision tree from training data What is the decision on each leaf 13 How to learn a decision tree Top down induction ID3 C4 5 CART Refund Yes No NO MarSt Single Divorced Married TaxInc 80K NO NO 80K YES 14 Which feature is best to split X1 X2 Y T T T T F F F F T F T F T F T F T T T T T F F F T Y 4 Ts 0 Fs Absolutely sure F Y 1 Ts 3 Fs Kind of sure F T Y 3 Ts 1 Fs Kind of sure Y 2 Ts 2 Fs Absolutely unsure Good split if we are more certain about classification after split Uniform distribution of labels is bad 15 Which feature is best to split Pick the attribute feature which yields maximum information gain H Y entropy of Y H Y Xi conditional entropy of Y 16 Entropy Entropy of a random variable Y Y Bernoulli p Uniform Max entropy Entropy H Y More uncertainty more entropy Deterministic Zero entropy p Information Theory interpretation H Y is the expected number of bits needed to encode a randomly drawn value of Y under most efficient code 17 Andrew Moore s Entropy in a Nutshell Low Entropy High Entropy the values locations of soup sampled entirely from within the soup bowl the values locations of soup unpredictable almost uniformly sampled throughout our dining room 18 Information Gain Advantage of attribute decrease in uncertainty Entropy of Y before split Entropy of Y after splitting based on Xi Weight by probability of following each branch Information gain is difference Max Information gain min conditional entropy 19 Information Gain X1 T T X2 T F Y T T T T F F F F T F T F T F T T T F F F T Y 4 Ts 0 Fs F Y 1 Ts 3 Fs T Y 3 Ts 1 Fs F Y 2 Ts 2 Fs 0 20 Which feature is best to split Pick the attribute feature which yields maximum information gain H Y entropy of Y H Y Xi conditional entropy of Y Feature which yields maximum reduction in entropy provides maximum information about Y 21 Expressiveness of Decision Trees Decision trees can express any function of the input features E g for Boolean functions truth table row path to leaf There is a decision tree which perfectly classifies a training set with one path to leaf for each example But it won t generalize well to new examples prefer to find more compact decision trees 22 Decision Trees Overfitting One training example per leaf overfits need compact pruned decision tree 23 Bias Variance Tradeoff average classifier Classifiers based on different training data coarse partition bias large variance small fine partition bias small variance large Ideal classifier 24 When to Stop Many strategies for picking simpler trees Pre pruning Fixed depth Fixed number of leaves Post pruning Refund Yes No MarSt Single Divorced Chi square test Convert decision tree to a set of rules Eliminate variable values in rules which are independent of label using chi square test for independence Simplify rule set by eliminating unnecessary rules Married NO Information Criteria MDL Minimum Description Length 25 Information Criteria Penalize complex models by introducing cost log likelihood cost regression classification penalize trees with more leaves 26 Information Criteria MDL Penalize complex models based on their information content MDL Minimum Description Length bits needed to describe f description length Example Binary Decision trees k leaves 2k 1 nodes 2k 1 bits to encode tree structure k bits to encode label of each leaf 0 1 5 leaves 9 bits to encode structure So far What does a decision tree represent Given a decision tree how do we assign label to a test point Now How do we learn a decision tree from training data What is the decision on each leaf 28 How to assign label to each leaf Classification Majority vote Regression 29 How to assign label to each leaf Classification Majority vote Regression Constant Linear Poly fit 30 Regression trees Num Children 2 2 Average fit a constant using training data at the leaves 31 Connection between nearest neighbor histogram classifiers and decision trees 32 Local prediction


View Full Document

CMU CS 10701 - Lecture9

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Lecture9
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture9 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture9 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?