DOC PREVIEW
UTD CS 6375 - Chapter 3

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Decision Tree Learning[read Chapter 3][recommended exercises 3.1, 3.4]Decision tree representationID3 learning algorithmEntropy, Information gainOvertting46 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Decision Tree forP lay T ennisOutlookOvercastHumidityNormalHighNo YesWindStrong WeakNo YesYesRainSunny47 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997A Tree to Predict C-Section RiskLearned from medical records of 1000 womenNegative examples are C-sections[833+,167-] .83+ .17-Fetal_Presentation = 1: [822+,116-] .88+ .12-| Previous_Csection = 0: [767+,81-] .90+ .10-| | Primiparous = 0: [399+,13-] .97+ .03-| | Primiparous = 1: [368+,68-] .84+ .16-| | | Fetal_Distress = 0: [334+,47-] .88+ .12-| | | | Birth_Weight < 3349: [201+,10.6-] .95+ .05-| | | | Birth_Weight >= 3349: [133+,36.4-] .78+ .22-| | | Fetal_Distress = 1: [34+,21-] .62+ .38-| Previous_Csection = 1: [55+,35-] .61+ .39-Fetal_Presentation = 2: [3+,29-] .11+ .89-Fetal_Presentation = 3: [8+,22-] .27+ .73-48 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Decision TreesDecision tree representation:Each internal node tests an attributeEach branch corresp onds to attribute valueEach leaf node assigns a classicati onHow would we represent: ^;_;XOR(A^B)_(C^ :D^E)MofN49 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997When to Consider Decision TreesInstances describable by attribute{value pairsTarget function is discrete valuedDisjunctive hyp othesis may b e requiredPossibly noisy training dataExamples:Equipment or medical diagnosisCredit risk analysisMo deling calendar scheduling preferences50 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Top-Down Induction of Decision TreesMain lo op:1.A the \b est" decision attribute for nextnode2. AssignAas decision attribute fornode3. For each value ofA, create new descendant ofnode4. Sort training examples to leaf nodes5. If training examples perfectly classied, ThenSTOP, Else iterate over new leaf nodesWhich attribute is b est?A1=? A2=?ft ft[29+,35-] [29+,35-][21+,5-] [8+,30-] [18+,33-] [11+,2-]51 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997EntropyEntropy(S)1.00.50.0 0.5 1.0p+Sis a sample of training examplespis the prop ortion of p ositive examples inSpis the prop ortion of negative examples inSEntropy measures the impurity ofSE ntr opy(S) ?plog2p?plog2p52 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997EntropyE ntr opy(S) = expected numb er of bits needed toenco de class (or) of randomly drawnmemb er ofS(under the optimal, shortest-lengthco de)Why?Information theory: optimal length code assigns?log2pbits to message having probabilityp.So, exp ected numb er of bits to encodeorofrandom member ofS:p(?log2p) +p(?log2p)E ntr opy(S) ?plog2p?plog2p53 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Information GainGain(S; A) = exp ected reduction in entropy due tosorting onAGain(S; A)E ntr opy(S)?Xv2V alues(A)jSvjjSjE ntr opy(Sv)A1=? A2=?ft ft[29+,35-] [29+,35-][21+,5-] [8+,30-] [18+,33-] [11+,2-]54 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Training ExamplesDay Outlo ok Temp erature Humidity Wind PlayTennisD1 Sunny Hot High Weak NoD2 Sunny Hot High Strong NoD3 Overcast Hot High Weak YesD4 Rain Mild High Weak YesD5 Rain Co ol Normal Weak YesD6 Rain Co ol Normal Strong NoD7 Overcast Co ol Normal Strong YesD8 Sunny Mild High Weak NoD9 Sunny Co ol Normal Weak YesD10 Rain Mild Normal Weak YesD11 Sunny Mild Normal Strong YesD12 Overcast Mild High Strong YesD13 Overcast Hot Normal Weak YesD14 Rain Mild High Strong No55 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Selecting the Next AttributeWhich attribute is the best classifier?High NormalHumidity[3+,4-] [6+,1-]WindWeak Strong[6+,2-] [3+,3-] = .940 - (7/14).985 - (7/14).592 = .151 = .940 - (8/14).811 - (6/14)1.0 = .048Gain (S, Humidity ) Gain (S, )Wind=0.940E=0.940E=0.811E=0.592E=0.985E=1.00E[9+,5-]S:[9+,5-]S:56 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997OutlookSunny Overcast Rain[9+,5−]{D1,D2,D8,D9,D11} {D3,D7,D12,D13} {D4,D5,D6,D10,D14}[2+,3−] [4+,0−] [3+,2−]Yes{D1, D2, ..., D14}? ?Which attribute should be tested here?Ssunny= {D1,D2,D8,D9,D11}Gain (Ssunny, Humidity)sunnyGain (S , Temperature)= .970 − (2/5) 0.0 − (2/5) 1.0 − (1/5) 0.0 = .570Gain (S sunny, Wind)= .970 − (2/5) 1.0 − (3/5) .918 = .019 = .970 − (3/5) 0.0 − (2/5) 0.0 = .97057 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Hyp othesis Space Search by ID3...+ ++A1+ – + –A2A3+...+ – + –A2A4–+ – + –A2+ – +......–58 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Hyp othesis Space Search by ID3Hyp othesis space is complete!{Target function surely in there...Outputs a single hypothesis (which one?){Can't play 20 questions...No back tracking{Lo cal minima...Statisicall y-based search choices{Robust to noisy data...Inductive bias: approx \prefer shortest tree"59 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Inductive Bias in ID3NoteHis the p ower set of instancesX!Unbiased?Not really...Preference for short trees, and for those withhigh information gain attributes near the ro otBias is apreferencefor some hyp otheses, ratherthan arestrictionof hypothesis spaceHOccam's razor: prefer the shortest hyp othesisthat ts the data60 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Occam's RazorWhy prefer short hyp otheses?Argument in favor:Fewer short hyps. than long hyps.!a short hyp that ts data unlikely to becoincidence!a long hyp that ts data might b e coincidenceArgument opp osed:There are many ways to dene small sets of hypse.g., all trees with a prime number of nodes thatuse attributes b eginning with \Z"What's so special ab out small sets based onsizeof hypothesis??61 lecture slides for textbo okMachine Learning,cTom M. Mitchell, McGraw Hill, 1997Overtting in Decision TreesConsider adding


View Full Document

UTD CS 6375 - Chapter 3

Download Chapter 3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 3 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?