DOC PREVIEW
CMU CS 10701 - Decision Trees Boosting

This preview shows page 1-2-3-4-5-32-33-34-35-64-65-66-67-68 out of 68 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Decision Trees many possible refs e g Mitchell Chapter 3 Boosting Linked from class website Schapire 01 Decision Trees Boosting Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University February 6th 2006 2006 Carlos Guestrin 1 Announcements Recitations stay on Thursdays 5 6 30pm in Wean 5409 This week Decision Trees and Boosting Pittsburgh won the Super Bowl 2006 Carlos Guestrin 2 Linear separability A dataset is linearly separable iff a separating hyperplane w such that w0 i wi xi 0 if x x1 xn is a positive example w0 i wi xi 0 if x x1 xn is a negative example 2006 Carlos Guestrin 3 Not linearly separable data Some datasets are not linearly separable 2006 Carlos Guestrin 4 Addressing non linearly separable data Option 1 non linear features Choose non linear features e g Typical linear features w0 i wi xi Example of non linear features Degree 2 polynomials w0 i wi xi ij wij xi xj Classifier hw x still linear in parameters w Usually easy to learn closed form or convex concave optimization Data is linearly separable in higher dimensional spaces More discussion later this semester 2006 Carlos Guestrin 5 Addressing non linearly separable data Option 2 non linear classifier Choose a classifier hw x that is non linear in parameters w e g Decision trees neural networks nearest neighbor More general than linear classifiers But can often be harder to learn non convex concave optimization required But but often very useful BTW Later this semester we ll see that these options are not that different 2006 Carlos Guestrin 6 A small dataset Miles Per Gallon mpg Suppose we want to predict MPG good bad bad bad bad bad bad bad bad good bad good bad good good bad good bad cylinders displacement horsepower 4 6 4 8 6 4 4 8 8 8 8 4 6 4 4 8 4 5 low medium medium high medium low low high high high high low medium medium low high low medium low medium medium high medium medium medium high high medium high low medium low low high medium medium weight acceleration modelyear maker low medium medium high medium low low high high high high low medium low medium high low medium high medium low low medium medium low low low high low low high low high low medium medium 75to78 70to74 75to78 70to74 70to74 70to74 70to74 75to78 70to74 79to83 75to78 79to83 75to78 79to83 79to83 70to74 75to78 75to78 asia america europe america america asia asia america america america america america america america america america europe europe 40 Records From the UCI repository thanks to Ross Quinlan 2006 Carlos Guestrin 7 A Decision Stump 2006 Carlos Guestrin 8 Recursion Step Records in which cylinders 4 Take the Original Dataset And partition it according to the value of the attribute we split on Records in which cylinders 5 Records in which cylinders 6 Records in which cylinders 8 2006 Carlos Guestrin 9 Recursion Step Build tree from These records Records in which cylinders 4 Build tree from These records Records in which cylinders 5 Build tree from These records Records in which cylinders 6 2006 Carlos Guestrin Build tree from These records Records in which cylinders 8 10 Second level of tree Recursively build a tree from the seven records in which there are four cylinders and the maker was based in Asia 2006 Carlos Guestrin Similar recursion in the other cases 11 The final tree 2006 Carlos Guestrin 12 Classification of a new example Classifying a test example traverse tree and report leaf label 2006 Carlos Guestrin 13 Are all decision trees equal Many trees can represent the same concept But not all trees will have the same size e g A B A C A and B or not A and C 2006 Carlos Guestrin 14 Learning decision trees is hard Learning the simplest smallest decision tree is an NP complete problem Hyafil Rivest 76 Resort to a greedy heuristic Start from empty decision tree Split on next best attribute feature Recurse 2006 Carlos Guestrin 15 Choosing a good attribute 2006 Carlos Guestrin X1 T T T T F F F F X2 T F T F T F T F Y T T T T T F F F 16 Measuring uncertainty Good split if we are more certain about classification after split Deterministic good all true or all false Uniform distribution bad P Y A 1 2 P Y B 1 4 P Y C 1 8 P Y D 1 8 P Y A 1 4 P Y B 1 4 P Y C 1 4 P Y D 1 4 2006 Carlos Guestrin 17 Entropy Entropy H X of a random variable Y More uncertainty more entropy Information Theory interpretation H Y is the expected number of bits needed to encode a randomly drawn value of Y under most efficient code 2006 Carlos Guestrin 18 Andrew Moore s Entropy in a nutshell Low Entropy High Entropy 2006 Carlos Guestrin 19 Andrew Moore s Entropy in a nutshell Low Entropy High Entropy the values locations of soup sampled entirely from within the soup bowl 2006 Carlos Guestrin the values locations of soup unpredictable almost uniformly sampled throughout our dining room 20 X1 T T Advantage of attribute decrease in uncertainty T Entropy of Y before you split T F Entropy after split F Weight by probability of following each branch i e Information gain X2 T F T F T F Y T T T T T F normalized number of records Information gain is difference 2006 Carlos Guestrin 21 Learning decision trees Start from empty decision tree Split on next best attribute feature Use for example information gain to select attribute Split on Recurse 2006 Carlos Guestrin 22 Information Gain Example 2006 Carlos Guestrin 23 Suppose we want to predict MPG Look at all the information gains 2006 Carlos Guestrin 24 A Decision Stump 2006 Carlos Guestrin 25 Base Case One Don t split a node if all matching records have the same output value 2006 Carlos Guestrin 26 Base Case Two Don t split a node if none of the attributes can create multiple nonempty children 2006 Carlos Guestrin 27 Base Case Two No attributes can distinguish 2006 Carlos Guestrin 28 Base Cases Base Case One If all records in current data subset have the same output then don t recurse Base Case Two If all records have exactly the same set of input attributes then don t recurse 2006 Carlos Guestrin 29 Base Cases An idea Base Case One If all records in current data subset have the same output then don t recurse Base Case Two If all records have exactly the same set of input attributes then don t recurse Proposed Base Case 3 If all attributes have zero information gain then don t recurse Is this a good idea 2006 Carlos Guestrin 30 The problem with Base Case 3 a b 0 0 1 1 y 0 1 0 1 0 1 1 0 y a XOR b The information gains The resulting decision tree 2006 Carlos Guestrin 31 If we omit Base Case 3 a b 0 0 …


View Full Document

CMU CS 10701 - Decision Trees Boosting

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Decision Trees Boosting
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Decision Trees Boosting and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Decision Trees Boosting and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?