DOC PREVIEW
UT Dallas CS 6375 - 04 - Decision Trees

This preview shows page 1-2-3-19-20-39-40-41 out of 41 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Choosing the best Attribute?Slide 10Slide 11Slide 12A better heuristic from Information TheorySlide 14High, Low EntropyInformation Gain: Using Entropy to make decisionsWhen do I play tennis?Slide 18When do I play tennis?Decision TreeIs the decision tree correct?When do I play tennis?Wind attribute – 5 records matchCalculationDecision Trees: Hypothesis Spaces and Search methods recapSlide 26Slide 27A deeper look at Gain RatioSlide 29Slide 30Sources of OverfittingAvoiding OverfittingSlide 33Slide 34Slide 35Slide 36Slide 37Handling Missing ValuesHandling Missing ValuesHandling Continuous attributesSummary: Decision TreesDecision TreesThe University of Texas at DallasChoosing the best Attribute?•Fundamental principle underlying tree creation–Simplicity–Occam’s Razor: Simplest model that explains the data should be preferred•Each node divides the data into subsets–Make each subset as pure as possible.A better heuristic from Information Theory•High, Low Entropy•“High Entropy” –X is from a uniform like distribution–Flat histogram–Values sampled from it are less predictable•“Low Entropy” –X is from a varied (peaks and valleys) distribution–Histogram has many lows and highs–Values sampled from it are more predictableInformation Gain: Using Entropy to make decisions• Read Wikipedia articles on Entropy and Mutual InformationWhen do I play tennis?When do I play tennis?Decision TreeIs the decision tree correct?•Let’s check whether the split on Wind attribute is correct.•We need to show that Wind attribute has the highest information gain.When do I play tennis?Wind attribute – 5 records matchNote: calculate the entropy only on examples that got “routed” in our branch of the tree (Outlook=Rain)Calculation•S = {D4, D5, D6, D10, D14}•Entropy:H(S) = – 3/5log(3/5) – 2/5log(2/5) = 0.971•Information Gain IG(S,Temp) = H(S) – H(S|Temp) = 0.01997IG(S, Humidity) = H(S) – H(S|Humidity) = 0.01997IG(S,Wind) = H(S) – H(S|Wind) = 0.971Decision Trees: Hypothesis Spaces and Search methods recap•We search a variable-sized hypothesis space–We start at empty and grow it as we build it•Space is Complete: All target concepts are included in this space•Local search: No Backtracking•Batch: At each step, we use all training examples to make a statistically based decision.A deeper look at Gain Ratio•What is Split In Information of Date?–Data set size n, each has a different date.•What is Split In Information of an attribute “Weather=Snowing” in Texas?–Snows one day and is sunny or overcast on others•Heuristic–First compute Gain–Apply Gain ratio only on attributes which have above average Gain.Sources of Overfitting•Noise•Small number of examples associated with each leaf–What if only one example is associated with a leaf. Can you believe it?–Coincidental regularities•Generalization is the most important criteria–Your method should work well on examples which you have not seen before.Avoiding Overfitting•Two approaches–Stop growing the tree when data split is not statistically significant–Grow tree fully, then post-prune•Key Issue: What is the correct tree size?–Divide data into training and validation set•Random noise in two sets might be different–Apply statistical test to estimate whether expanding a particular node is likely to produce an improvement beyond the training set–Add a complexity penaltyLeaf nodes added because of coincidental regularities are likely to be pruned because the same coincidences are unlikely to occur in the validation setHandling Missing Values•Missing values: Some attribute-values in an example are missing–Example: patient data. You don’t expect blood test results for everyone.•Treat the missing value as another value•Ignore instances having missing values–Problematic because you are throwing away important data. Data is scarce.Handling Missing Values•Probabilistic approach–Assign a probability to each possible value of A–Let us assume that A(x=1)=0.4 and A(x=0)=0.6–A fractional 0.4 of instance goes to branch A(x=1) and 0.6 to branch A(x=0)–Use fractional instances to compute gain•Classification–Most probable classificationHandling Continuous attributes•Thresholding •How to select Thresholds?•Pick a threshold that has the highest gain!•Sort the examples and calculate gain at all points where the classification changes from “yes to no” or “no to yes”–Provably maximizes the information gain.40 48 60 72 80 90no no yes yes yes noSummary: Decision Trees•Representation•Tree growth•Choosing the best attribute•Overfitting and pruning•Special cases: Missing Attributes and Continuous Attributes•Many forms in practice: CART, ID3, C


View Full Document

UT Dallas CS 6375 - 04 - Decision Trees

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download 04 - Decision Trees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 04 - Decision Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 04 - Decision Trees 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?