Slide 1Slide 2Slide 3Slide 4Slide 5Decision Trees as FeaturesSlide 7INTRODUCTION CS346-Spring 98CS446-Fall 06 1• diagnosis = < fever, blood_pressure,…, blood_test=?,…> • Many times values are not available for all attributes during training or testing (e.g., medical diagnosis)• Training: evaluate Gain(S,a) where in some of the examples a value for a is not given Missing Values with Decision TreesDay Outlook Temperature Humidity Wind PlayTennis 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 8 Sunny Mild ??? Weak No 9 Sunny Cool Normal Weak Yes11 Sunny Mild Normal Strong YesINTRODUCTION CS346-Spring 98CS446-Fall 06 24,5,6,10,14Outlook OvercastRain3,7,12,133+,2-Sunny1,2,8,9,114+,0-2+,3-Yes? ? Humidity),Gain(SsunnyTemp),Gain(SsunnyMissing ValuesDay Outlook Temperature Humidity Wind PlayTennis 8 Sunny Mild ??? Weak NoA) the most common Humidity at SunnyB) as (A) but w/ PlayTennis = NoC) Count the example fractionallyUse:INTRODUCTION CS346-Spring 98CS446-Fall 06 3• diagnosis = < fever, blood_pressure,…, blood_test=?,…> • Many times values are not available for all attributes during training or testing (e.g., medical diagnosis)• Training: evaluate Gain(S,a) where in some of the examples a value for a is not given • Testing: classify an example without knowing the value of a Missing ValuesINTRODUCTION CS346-Spring 98CS446-Fall 06 4Outlook OvercastRain3,7,12,134,5,6,10,143+,2-Sunny1,2,8,9,114+,0-2+,3-YesHumidity WindNormalHighNoWeakStrongNo YesYesOutlook = ???, Temp = Hot, Humidity = Normal, Wind = Strong, label = ??Blend by labels:1/3 Yes + 1/3 Yes +1/3 No = YesBlend by probability(est. by counts)Missing ValuesINTRODUCTION CS346-Spring 98CS446-Fall 06 5• Attributes with different costs Change information gain so that low cost attribute are preferred• Alternative measures for selecting attributes When different attributes have different number of values information gain tends to prefer those with many values• Oblique Decision Trees Decisions are not axis-parallel• Incremental Decision Trees induction Update an existing decision tree to account for new examples incrementally (Maintain consistency ?) Other IssuesINTRODUCTION CS346-Spring 98CS446-Fall 06 6Decision Trees as FeaturesRather than using decision trees to represent the target function it is becoming common to use small decision trees are featuresWhen learning over a large number of features, learning decision trees is difficult and the resulting tree may be very large (over fitting)Instead, learn small decision trees, with limited depth.Treat them as “experts”; they are correct, but only on a small region in the domain. (what DTs to learn? same every time?)Then, learn another function, typically a linear function, over these as features. Boosting (but also other linear learners) are used on top of the small decision trees. (Either Boolean, or real valued features)INTRODUCTION CS346-Spring 98CS446-Fall 06 7• Hypothesis Space: Contains all functions (!) Variable size Deterministic; Discrete and Continuous attributes • Search Algorithm ID3 - Eager, batch, constructive search Extensions: missing values• Issues: What is the goal? When to stop? How to guarantee good generalization?• Did not address: How are we doing? (Accuracy, Complexity)Decision Trees -
View Full Document