##
This **preview** shows page *1-2-15-16-31-32*
out of 32 **pages**.

*View Full Document*

End of preview. Want to read all 32 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document**Unformatted text preview:**

CS 5751 Machine LearningChapter 3 Decision Tree Learning 1Decision Trees• Decision tree representation• ID3 learning algorithm• Entropy, Information gain• OverfittingCS 5751 Machine LearningChapter 3 Decision Tree Learning 2Another Example ProblemNegative ExamplesPositive ExamplesCS 5751 Machine LearningChapter 3 Decision Tree Learning 3A Decision TreeTypeDoors Tires-CarMinivanSUV++--2 4 Blackwall WhitewallCS 5751 Machine LearningChapter 3 Decision Tree Learning 4Decision TreesDecision tree representation• Each internal node tests an attribute• Each branch corresponds to an attribute value• Each leaf node assigns a classificationHow would you represent:N of M E) D (C B) (A XOR , , •∧¬∧∨∧•∨∧•CS 5751 Machine LearningChapter 3 Decision Tree Learning 5When to Consider Decision Trees• Instances describable by attribute-value pairs• Target function is discrete valued• Disjunctive hypothesis may be required• Possibly noisy training dataExamples• Equipment or medical diagnosis• Credit risk analysis• Modeling calendar scheduling preferencesCS 5751 Machine LearningChapter 3 Decision Tree Learning 6Top-Down Induction of Decision TreesMain loop:1. A = the “best” decision attribute for next node2. Assign A as decision attribute for node3. For each value of A, create descendant of node4. Divide training examples among child nodes5. If training examples perfectly classified, STOPElse iterate over new leaf nodesWhich attributeis best?A1[29+,35-][8+,30-][21+,5-]A2[29+,35-][11+,2-][18+,33-]CS 5751 Machine LearningChapter 3 Decision Tree Learning 7Entropy−−++−+−≡•=•=•=•ppp-py(S) EntropSSpSpS22loglog ofimpurity themeasuresEntropy in examples negative of proportion in examples positive of proportion examples ng trainiofsample 00.20.40.60.8100.10.20.30.40.50.60.70.80.91Probability (+)Entropy(S)CS 5751 Machine LearningChapter 3 Decision Tree Learning 8EntropyEntropy(S) = expected number of bits need to encode class (+ or -) of randomly drawn member of S (using an optimal, shortest-length code)Why?Information theory: optimal length code assigns -log2pbits to message having probability pSo, expected number of bits to encode + or - of random member of S:−−++−−++−≡−ppp-pEntropy(S)ppp-p2222loglog loglogCS 5751 Machine LearningChapter 3 Decision Tree Learning 9Information GainGain(S,A) = expected reduction in entropy due to sorting on A)()(),()(vAValuesvvSEntropySSSEntropyASGain∑∈−≡A1[29+,35-][8+,30-][21+,5-]A2[11+,2-][18+,33-]121.0)2,(619.0])2,11([937.0])33,18([266.0]))30,8([6438 ])5,21([6426(994.0)1,(742.0])30,8([706.0265log265)2621(log2621])5,21([994.06435log6435)6429(log6429])35,29([2222==−+=−+=−++−+−==−+=−−=−+=−−=−+ASGainEntropyEntropyEntropyEntropyASGainEntropyEntropyEntropyCS 5751 Machine LearningChapter 3 Decision Tree Learning 10Car ExamplesColor Type Doors Tires ClassRed SUV 2 Whitewall +Blue Minivan 4 Whitewall -Green Car 4 Whitewall -Red Minivan 4 Blackwall -Green Car 2 Blackwall +Green SUV 4 Blackwall -Blue SUV 2 Blackwall -Blue Car 2 Whitewall +Red SUV 2 Blackwall -Blue Car 4 Blackwall -Green SUV 4 Whitewall +Red Car 2 Blackwall +Green SUV 2 Blackwall -Green Minivan 4 Whitewall -CS 5751 Machine LearningChapter 3 Decision Tree Learning 11Selecting Root AttributeColorRedBlueGreen--+--+--S: [5+,9-]E=0.940Gain(S,Color)= .940 - (4/14)1.0 - (4/14).811 -(6/14).918= 0.029+++---TypeCarMinivnSUV--+--+--S: [5+,9-]E=0.940Gain(S,Type)= .940 - (5/14).971 - (3/14)0 -(6/14)0.918= 0.200+++---CS 5751 Machine LearningChapter 3 Decision Tree Learning 12Selecting Root Attribute (cont)Gain(S,Doors)= .940 - (7/14)0.985 - (7/14)0.592= 0.152Doors24--+--+--S: [5+,9-]E=0.940+++---TiresWhitewallBlackwall--+--+--S: [5+,9-]E=0.940Gain(S,Type)= .940 - (6/14)1.0 - (8/14).811= 0.048+++---Best attribute: Type (Gain = 0.200)CS 5751 Machine LearningChapter 3 Decision Tree Learning 13Selecting Next AttributeTypeCarMinivnSUV--+--+--+++---S: [3+,2-]E=0.971S: [0+,3-]E=0.0S: [2+,4-]E=0.918? ?-Gain(SCar,Color) = .971-(1/5)0.0-(2/5)1.0-(2/5)1.0=.171Gain(SCar,Doors)= .971-(3/5)0.0-(2/5)0.0= .971Gain(SCar,Tires)= .971-(2/5)1.0-(3/5).918= .020Gain(SSUV,Color)= .918-(2/6)1.0-(1/6)0.0-(3/6).918= .126Gain(SSUV,Doors)= .918-(4/6).811-(2/6)1.0= .044Gain(SSUV,Tires)= .918-(2/6)0.0-(4/6)0.0= .918SCarSSUVCS 5751 Machine LearningChapter 3 Decision Tree Learning 14Resulting TreeTypeDoors Tires-CarMinivanSUV++--2 4 Blackwall WhitewallCS 5751 Machine LearningChapter 3 Decision Tree Learning 15Hypothesis Space Search by ID3+-+A1-+-+A2-+ +-+-+A2-A1++-+A2-A3CS 5751 Machine LearningChapter 3 Decision Tree Learning 16Hypothesis Space Search by ID3• Hypothesis space is complete!– Target function is in there (but will we find it?)• Outputs a single hypothesis (which one?)– Cannot play 20 questions• No back tracing– Local minima possible• Statistically-based search choices– Robust to noisy data• Inductive bias: approximately “prefer shortest tree”CS 5751 Machine LearningChapter 3 Decision Tree Learning 17Inductive Bias in ID3Note H is the power set of instances XUnbiased?Not really…• Preference for short trees, and for those with high information gain attributes near the root• Bias is a preference for some hypotheses, rather than a restriction of hypothesis space H• Occam’s razor: prefer the shortest hypothesis that fits the dataCS 5751 Machine LearningChapter 3 Decision Tree Learning 18Occam’s RazorWhy prefer short hypotheses?Argument in favor:– Fewer short hypotheses than long hypotheses– short hyp. that fits data unlikely to be coincidence– long hyp. that fits data more likely to be coincidenceArgument opposed:– There are many ways to define small sets of hypotheses– e.g., all trees with a prime number of nodes that use attributes beginning with “Z”– What is so special about small sets based on size of hypothesis?CS 5751 Machine LearningChapter 3 Decision Tree Learning 19Overfitting in Decision TreesTypeDoors Tires-CarMinivanSUV++--2 4 Blackwall WhitewallConsider adding a noisy training example:<Green,SUV,2,Blackwall> +What happens to decision tree below?CS 5751 Machine LearningChapter 3 Decision Tree Learning 20Overfitting)'()( and)'()( such that hypothesis ealternativan is

View Full Document