U of M CS 5541 - Decision Trees - D1390096

Home> Schools> University of Minnesota- Twin Cities> (CS) > CS 5541> Decision Trees

U of M CS 5541 - Decision Trees

School name University of Minnesota- Twin Cities

Pages 32

Download Save

Unformatted text preview:

Decision Trees• Decision tree representation• ID3 learning algorithm•Entropy Information gain•Entropy, Information gain• OverfittingCS 5541 Chapter 3 Decision Tree Learning 1Another Example ProblemPositive ExamplesNegative ExamplesCS 5751 Machine LearningChapter 3 Decision Tree Learning 2A Decision TreeTypeCarMinivanSUVDoors Tires-Minivan2 4 Blackwall Whitewall++--CS 5751 Machine LearningChapter 3 Decision Tree Learning 3Decision TreesDecision tree representation•Each internal node tests an attributeEach internal node tests an attribute• Each branch corresponds to an attribute value•Each leaf node assigns a classificationEach leaf node assigns a classificationHow would you represent:How would you represent:E)D(CB)(AXOR , , ∧¬∧∨∧•∨∧•N of M E)D(C B) (A •∧¬∧∨∧•CS 5751 Machine LearningChapter 3 Decision Tree Learning 4When to Consider Decision Trees• Instances describable by attribute-value pairs•Target function is discrete valuedTarget function is discrete valued• Disjunctive hypothesis may be required•Possibly noisy training dataPossibly noisy training dataExamplesExamples• Equipment or medical diagnosis•Credit risk analysis•Credit risk analysis• Modeling calendar scheduling preferencesCS 5751 Machine LearningChapter 3 Decision Tree Learning 5Top-Down Induction of Decision TreesMain loop:1. A = the “best” decision attribute for next node2. Assign A as decision attribute for node3. For each value of A, create descendant of node4. Divide training examples among child nodes5. If training examples perfectly classified, STOPEl i l f dElse iterate over new leaf nodesWhi h ib[29+,35-] [29+,35-]Which attributeis best?A1[8+ 30][21+ 5]A2[11+ 2][18+ 33]CS 5751 Machine LearningChapter 3 Decision Tree Learning 6[8+,30-][21+,5-][11+,2-][18+,33-]Entropy=•Sexamples ng trainiofsample 1+=•Spin examples positive of proportion 060.8S)−=•Spin examples negative of proportion 0.40.6Entropy(S•S ofimpurity themeasuresEntropy 0.2≡y(S) Entropll000.10.20.30.40.50.60.70.80.91Probability (+)CS 5751 Machine LearningChapter 3 Decision Tree Learning 7−−++−ppp-p22loglog Probability ()EntropyEntropy(S) = expected number of bits need to encode class (+ or -) of randomly drawn member of S ()y(using an optimal, shortest-length code)Why?Information theory: optimal length code assigns -log2pbits to message having probability pStdbfbittd+fdSo, expected number of bits to encode + or - of random member of S:−ppp-p22loglog−−++≡ppppEntropy(S)pppp22loglogloglogCS 5751 Machine LearningChapter 3 Decision Tree Learning 8−−++−≡ppp-pEntropy(S)22loglogInformation GainGain(S,A) = expected reduction in entropy due to sorting on Ag)()(),()(vAValuesvvSEntropySSSEntropyASGain∑∈−≡35352929A1[29+,35-]706.0265log265)2621(log2621])5,21([994.06435log6435)6429(log6429])35,29([2222=−−=−+=−−=−+EntropyEntropy[8+,30-][21+,5-]38])5,21([6426(994.0)1,(742.0])30,8([+−+−==−+EntropyASGainEntropyA2[11+ 2[18+ 33619.0])2,11([937.0])33,18([266.0]))30,8([6438 =−+=−+=−+EntropyEntropyEntropyCS 5751 Machine LearningChapter 3 Decision Tree Learning 9[11+,2-][18+,33-]121.0)2,(=ASGainCar ExamplesColor Type Doors Tires ClassRed SUV 2 Whitewall +Blue Minivan 4 Whitewall -ue a te aGreen Car 4 Whitewall -Red Minivan 4 Blackwall -Green Car 2 Blackwall+Green Car 2 Blackwall Green SUV 4 Blackwall -Blue SUV 2 Blackwall -Blue Car 2 Whitewall +Blue Car 2 Whitewall +Red SUV 2 Blackwall -Blue Car 4 Blackwall -Green SUV 4 Whitewall +Green SUV 4 Whitewall +Red Car 2 Blackwall +Green SUV 2 Blackwall -Green Minivan 4 Whitewall-CS 5751 Machine LearningChapter 3 Decision Tree Learning 10Green Minivan 4 WhitewallSelecting Root AttributeS: [5+,9-]E=0.940S: [5+,9-]E=0.940ColorRedBlueGreenTypeCarMinivnSUV+ +-+++--M--+-+++--------- --+-Gain(S,Type)= 940 (5/14) 971 (3/14)0 (6/14)0 918Gain(S,Color)= .940 - (4/14)1.0 - (4/14).811 -(6/14).918= 0.029= .940 - (5/14).971 - (3/14)0 -(6/14)0.918= 0.200CS 5751 Machine LearningChapter 3 Decision Tree Learning 11Selecting Root Attribute (cont)DoorsS: [5+,9-]E=0.940TiresS: [5+,9-]E=0.940Doors24TiresWhitewallBlackwall-+--++---++-++--Gi(SD )--+-+-----Gi(ST )+-Gain(S,Doors)= .940 - (7/14)0.985 - (7/14)0.592= 0.152Gain(S,Type)= .940 - (6/14)1.0 - (8/14).811= 0.048CS 5751 Machine LearningChapter 3 Decision Tree Learning 12Best attribute: Type (Gain = 0.200)Selecting Next AttributeTypenSGain(SCar,Color) =971 (1/5)0 0 (2/5)1 0 (2/5)1 0 171CarMinivnSUV--++- .971-(1/5)0.0-(2/5)1.0-(2/5)1.0=.171Gain(SCar,Doors)= .971-(3/5)0.0-(2/5)0.0= .971Gain(SCar,Tires)=SCarSSUV-+--+-+-- .971-(2/5)1.0-(3/5).918= .020Gain(SSUV,Color)= .918-(2/6)1.0-(1/6)0.0-(3/6).918= .126S: [3+,2-]E=0.971S: [0+,3-]E=0.0S: [2+,4-]E=0.918??() () ()Gain(SSUV,Doors)= .918-(4/6).811-(2/6)1.0= .044Gain(SSUV,Tires)=.918-(2/6)0.0-(4/6)0.0=.918??- .918 (2/6)0.0 (4/6)0.0 .918CS 5751 Machine LearningChapter 3 Decision Tree Learning 13Resulting TreeTypeCarMinivanSUVDoors Tires-Minivan2 4 Blackwall Whitewall++--CS 5751 Machine LearningChapter 3 Decision Tree Learning 14Hypothesis Space Search by ID3+ +-A1A2+-+ -+-+-+-+A2-A1+-+A2-A3CS 5751 Machine LearningChapter 3 Decision Tree Learning 15+Hypothesis Space Search by ID3• Hypothesis space is complete!–Target function is in there (but will we find it?)g()• Outputs a single hypothesis (which one?)– Cannot play 20 questions• No back tracing– Local minima possible• Statistically-based search choices– Robust to noisy data• Inductive bias: approximately “prefer shortest tree”CS 5751 Machine LearningChapter 3 Decision Tree Learning 16Inductive Bias in ID3Note H is the power set of instances XUnbiased?Unbiased?Not really…•Preference for short trees and for those with highPreference for short trees, and for those with high information gain attributes near the root•Bias is apreferencefor some hypotheses, ratherBias is a preferencefor some hypotheses, rather than a restriction of hypothesis space H• Occam’s razor: prefer the shortest hypothesis that pypfits the dataCS 5751 Machine LearningChapter 3 Decision Tree Learning 17Occam’s RazorWhy prefer short hypotheses?Argument in favor:Argument in favor:– Fewer short hypotheses than long hypotheses– short hyp. that fits data unlikely to be coincidence– long hyp. that fits

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M CS 5541 - Decision Trees

Sign up for free to view:

Please select your school