DOC PREVIEW
UTD CS 6375 - Chapter 3

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Decision Tree Learning read Chapter 3 recommended exercises 3 1 3 4 Decision tree representation ID3 learning algorithm Entropy Information gain Over tting 46 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Decision Tree for PlayTennis Outlook Sunny Humidity High No 47 Overcast Rain Wind Yes Normal Yes Strong No Weak Yes lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 A Tree to Predict C Section Risk Learned from medical records of 1000 women Negative examples are C sections 833 167 83 17Fetal Presentation 1 822 116 88 12 Previous Csection 0 767 81 90 10 Primiparous 0 399 13 97 03 Primiparous 1 368 68 84 16 Fetal Distress 0 334 47 88 12 Birth Weight 3349 201 10 6 95 05 Birth Weight 3349 133 36 4 78 2 Fetal Distress 1 34 21 62 38 Previous Csection 1 55 35 61 39Fetal Presentation 2 3 29 11 89Fetal Presentation 3 8 22 27 73 48 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Decision Trees Decision tree representation Each internal node tests an attribute Each branch corresponds to attribute value Each leaf node assigns a classi cation How would we represent XOR A B C D E M of N 49 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 When to Consider Decision Trees Instances describable by attribute value pairs Target function is discrete valued Disjunctive hypothesis may be required Possibly noisy training data Examples Equipment or medical diagnosis Credit risk analysis Modeling calendar scheduling preferences 50 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Top Down Induction of Decision Trees Main loop 1 A the best decision attribute for next node 2 Assign A as decision attribute for node 3 For each value of A create new descendant of node 4 Sort training examples to leaf nodes 5 If training examples perfectly classi ed Then STOP Else iterate over new leaf nodes Which attribute is best 29 35 t 21 5 51 A1 f 8 30 29 35 t 18 33 A2 f 11 2 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Entropy Entropy S 1 0 0 5 0 0 0 5 p 1 0 S is a sample of training examples p is the proportion of positive examples in S p is the proportion of negative examples in S Entropy measures the impurity of S Entropy S p log p p log p 2 52 2 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Entropy Entropy S expected number of bits needed to encode class or of randomly drawn member of S under the optimal shortest length code Why Information theory optimal length code assigns log p bits to message having probability p 2 So expected number of bits to encode or random member of S p log p p log p 2 of 2 Entropy S p log p p log p 2 53 2 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Information Gain Gain S A expected reduction in entropy due to sorting on A X jSv j Entropy S Gain S A Entropy S v2V alues v A jS j 29 35 t 21 5 54 A1 f 8 30 29 35 t 18 33 A2 f 11 2 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Training Examples Day D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 55 Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Selecting the Next Attribute Which attribute is the best classifier S 9 5 E 0 940 S 9 5 E 0 940 Humidity High Wind Normal 3 4 E 0 985 Gain S Humidity 940 7 14 985 7 14 592 151 56 6 1 E 0 592 Weak 6 2 E 0 811 Strong 3 3 E 1 00 Gain S Wind 940 8 14 811 6 14 1 0 048 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 D1 D2 D14 9 5 Outlook Sunny Overcast Rain D1 D2 D8 D9 D11 D3 D7 D12 D13 D4 D5 D6 D10 D14 2 3 4 0 3 2 Yes Which attribute should be tested here Ssunny D1 D2 D8 D9 D11 Gain Ssunny Humidity 970 3 5 0 0 2 5 0 0 970 Gain Ssunny Temperature 970 2 5 0 0 2 5 1 0 1 5 0 0 570 Gain Ssunny Wind 970 2 5 1 0 3 5 918 019 57 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Hypothesis Space Search by ID3 A2 A1 A2 A2 A4 A3 58 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Hypothesis Space Search by ID3 Hypothesis space is complete Target function surely in there Outputs a single hypothesis which one Can t play 20 questions No back tracking Local minima Statisically based search choices Robust to noisy data Inductive bias approx prefer shortest tree 59 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Inductive Bias in ID3 Note H is the power set of instances X Unbiased Not really Preference for short trees and for those with high information gain attributes near the root Bias is a preference for some hypotheses rather than a restriction of hypothesis space H Occam s razor prefer the shortest hypothesis that ts the data 60 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Occam s Razor Why prefer short hypotheses Argument in favor Fewer short hyps than long hyps a short hyp that ts data unlikely to be coincidence a long hyp that ts data might be coincidence Argument opposed There are many ways to de ne small sets of hyps e g all trees with a prime number of nodes that use attributes beginning with Z What s so special about small sets based on size of hypothesis 61 lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Over tting in Decision Trees Consider adding noisy training example 15 Sunny Hot Normal Strong PlayTennis No What e ect on earlier tree Outlook Sunny Humidity High No 62 Overcast Rain Wind Yes Normal Yes Strong No Weak Yes lecture slides for textbook Machine Learning c Tom M Mitchell McGraw Hill 1997 Over tting Consider error of hypothesis h over training data errortrain h entire distribution D of data errorD h Hypothesis h 2 …


View Full Document

UTD CS 6375 - Chapter 3

Download Chapter 3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 3 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?