DOC PREVIEW
CMU CS 10601 - Recitation

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur TastanOutlineDecision treesAnatomy of a decision treeSlide 5Slide 6To ‘play tennis’ or not.Slide 8Slide 9RepresentationSame concept different representationWhich attribute to select for splitting?How do we choose the test ?Information GainInformationSlide 16EntropySlide 18Entropy, purityConditional entropyInformation gainSlide 22ExampleWhich one do we choose?Recurse on branchesCaveatsPurity (diversity) measuresOverfittingBaggingBootstrapSlide 31Random Forest ClassifierSlide 33Slide 34Bagging : an simulated exampleBaggingSlide 37Random forest classifierSlide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Random forestMachine Learning 10601Recitation 8Oct 21, 2009Oznur TastanOutline•Tree representation•Brief information theory•Learning decision trees•Bagging •Random forestsDecision trees•Non-linear classifier•Easy to use •Easy to interpret•Succeptible to overfitting but can be avoided.Anatomy of a decision treeovercasthigh normalfalsetruesunnyrainNo NoYes YesYesOutlookHumidityWindyEach node is a test on one attributePossible attribute values of the nodeLeafs are the decisionsAnatomy of a decision treeovercasthigh normalfalsetruesunnyrainNo NoYes YesYesOutlookHumidityWindyEach node is a test on one attributePossible attribute values of the nodeLeafs are the decisionsAnatomy of a decision treeovercasthigh normalfalsetruesunnyrainNo NoYes YesYesOutlookHumidityWindyEach node is a test on one attributePossible attribute values of the nodeLeafs are the decisionsSample size Your data gets smallerTo ‘play tennis’ or not.overcasthigh normalfalsetruesunnyrainNo NoYes YesYesOutlookHumidityWindyA new test example:(Outlook==rain) and (not Windy==false)Pass it on the tree-> Decision is yes.To ‘play tennis’ or not.overcasthigh normalfalsetruesunnyrainNo NoYes YesYesOutlookHumidityWindy(Outlook ==overcast) -> yes(Outlook==rain) and (not Windy==false) ->yes(Outlook==sunny) and (Humidity=normal) ->yesDecision treesDecision trees represent a disjunction ofconjunctions of constraints on the attribute values ofinstances.(Outlook ==overcast) OR((Outlook==rain) and (not Windy==false)) OR((Outlook==sunny) and (Humidity=normal)) => yes play tennisRepresentation0A1C B0 110false true falseY=((A and B) or ((not A) and C))trueSame concept different representation0A1C B0 110false true falseY=((A and B) or ((not A) and C))true0C1BA0 101false truefalseA01true falseWhich attribute to select for splitting?16 +16 -8 +8 -8 +8 -4 +4 -4 +4 -4 +4 -4 +4 -2 +2 -2 +2 -This is bad splitting…the distribution of each class (not attribute)How do we choose the test ?Which attribute should be used as the test?Intuitively, you would prefer the one that separates the trainingexamples as much as possible.Information GainInformation gain is one criteria to decide on theattribute.InformationImagine:1. Someone is about to tell you your own name2. You are about to observe the outcome of a dice roll2. You are about to observe the outcome of a coin flip3. You are about to observe the outcome of a biased coin flipEach situation have a different amount of uncertaintyas to what outcome you will observe.InformationInformation:reduction in uncertainty (amount of surprise in the outcome)2 21( ) log log ( )( )I E p xp x= =-•Observing the outcome of a coin flip is head2. Observe the outcome of a dice is 62log 1/ 2 1I =- =2log 1/ 6 2.58I =- =If the probability of this event happening is small and it happens the information is large.Entropy The expected amount of information when observing the output of a random variable X2( ) ( ( )) ( ) ( ) ( )log ( )i i i ii iH X E I X p x I x p x p x= = =-� �If there X can have 8 outcomes and all are equally likely2( ) 1/ 8log 1/ 8 3iH X ==- =�bitsEntropyIf there are k possible outcomesEquality holds when all outcomes are equally likely The more the probability distributionthe deviates from uniformitythe lower the entropy2( ) logH X k�Entropy, purityEntropy measures the purity 4 +4 -8 +0 -The distribution is less uniformEntropy is lowerThe node is purerConditional entropy2( ) ( )log ( )i iiH X p x p x=-�( | ) ( ) ( | )j jjH X Y p y H X Y y=- =�2( ) ( | ) log ( | )j i j i jj ip y p x y p x y=-� �Information gainIG(X,Y)=H(X)-H(X|Y)Reduction in uncertainty by knowing YInformation gain: (information before split) – (information after split)Information GainInformation gain: (information before split) – (information after split)ExampleX1 X2 Y CountT T + 2T F + 2F T - 5F F + 1Attributes LabelsIG(X1,Y) = H(Y) – H(Y|X1)H(Y) = - (5/10) log(5/10) -5/10log(5/10) = 1H(Y|X1) = P(X1=T)H(Y|X1=T) + P(X1=F) H(Y|X1=F) = 4/10 (1log 1 + 0 log 0) +6/10 (5/6log 5/6 +1/6log1/6) = 0.39Information gain (X1,Y)= 1-0.39=0.61Which one do we choose X1 or X2?Which one do we choose?X1 X2 Y CountT T + 2T F + 2F T - 5F F + 1Information gain (X1,Y)= 0.61Information gain (X2,Y)= 0.12Pick X1Pick the variable which provides the most information gain about YRecurse on branchesX1 X2 Y CountT T + 2T F + 2F T - 5F F + 1One branchThe other branchCaveats•The number of possible values influences the information gain.•The more possible values, the higher the gain (the more likely it is to form small, but pure partitions)Purity (diversity) measuresPurity (Diversity) Measures:– Gini (population diversity)– Information Gain– Chi-square TestOverfitting•You can perfectly fit to any training data•Zero bias, high varianceTwo approaches:1. Stop growing the tree when further splitting the data does not yield an improvement2. Grow a full tree, then prune the tree, by eliminating nodes.Bagging•Bagging or bootstrap aggregation a technique for reducing the variance of an estimated prediction function. •For classification, a committee of trees each cast a vote for the predicted class.BootstrapThe basic idea:randomly draw datasets with replacement from the training data, each sample the same size as the original training setBaggingN examplesCreate bootstrap samplesfrom the training data ....…M featuresRandom Forest ClassifierN examplesConstruct a decision tree....…M featuresRandom Forest ClassifierN examples....…....…Take the majority voteM featuresBaggingZ = {(x1, y1), (x2, y2), . . . , (xN, yN)}Z*b where = 1,.., B.. The prediction at input x when bootstrap sample b is used for traininghttp://www-stat.stanford.edu/~hastie/Papers/ESLII.pdf (Chapter 8.7)Bagging : an simulated exampleGenerated


View Full Document

CMU CS 10601 - Recitation

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Recitation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Recitation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Recitation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?