Mt Holyoke CS 341 - Decision tree algorithms - D608896

Home> Schools> Mount Holyoke College> Cognitive Science (CS) > CS 341> Decision tree algorithms

Mt Holyoke CS 341 - Decision tree algorithms

Course Cs 341- Natural Language Processing

Pages 5

Download Save

Unformatted text preview:

1Data MiningData MiningCS 341, Spring 2007CS 341, Spring 2007Lecture 8: Decision tree Lecture 8: Decision tree algorithmsalgorithms© Prentice Hall 2Jackknife Estimator: Example 1Jackknife Estimator: Example 1nnEstimate of mean for X={xEstimate of mean for X={x1, x, x2, x, x3,}, n =3, g=3, ,}, n =3, g=3, m=1, m=1, θθθθθθθθ= = µ µ µ µ µ µ µ µ= (= (xx1+ x+ x2+ x+ x3)/3nnθθθθθθθθ11= (= (xx2+ x+ x3)/2, )/2, θθθθθθθθ22= (= (xx1+ x+ x3)/2, )/2, θθθθθθθθ11= (= (xx1+ x+ x2)/2, )/2, nnθθθθθθθθ_ = (_ = (θθθθθθθθ11+ + θθθθθθθθ2 2 + + θθθθθθθθ22)/3 )/3 nnθθθθθθθθQ Q = g= gθθθθθθθθ--(g(g--1) 1) θθθθθθθθ_= 3_= 3θθθθθθθθ--(3(3--1) 1) θθθθθθθθ_= (_= (xx1 + x+ x2+ x+ x3)/3)/3nnIn this case, the Jackknife Estimator is the In this case, the Jackknife Estimator is the same as the usual estimator.same as the usual estimator.© Prentice Hall 3Jackknife Estimator: Example 2Jackknife Estimator: Example 2nnEstimate of variance for X={1, 4, 4}, n =3, g=3, Estimate of variance for X={1, 4, 4}, n =3, g=3, m=1, m=1, θθθθθθθθ= = σ σ σ σ σ σ σ σ2 2 nn σ σ σ σ σ σ σ σ22= ((1= ((1--3)3)22+(4+(4--3)3)22+(4+(4--3)3)22)/3 = 2)/3 = 2nnθθθθθθθθ11= = ((4((4--4)4)22+ (4+ (4--4)4)22) /2 = 0) /2 = 0,,nnθθθθθθθθ22= = 2.252.25, , θθθθθθθθ33= = 2.252.25nnθθθθθθθθ_ = _ = ((θθ11+ + θθ2 2 + + θθ22)/3 = 1.5)/3 = 1.5nnθθθθθθθθQ Q = = ggθθ--(g(g--1) 1) θθ_= 3_= 3θθ--(3(3--1) 1) θθ__=3(2)=3(2)--2(1.5)=32(1.5)=3nnIn this case, the Jackknife Estimator is In this case, the Jackknife Estimator is different from the usual estimator.different from the usual estimator.© Prentice Hall 4Jackknife Estimator: Jackknife Estimator: Example 2(contExample 2(cont’’d)d)nnIn general, apply the Jackknife technique In general, apply the Jackknife technique to the biased estimator to the biased estimator σ σ σ σ σ σ σ σ22 σ σ σ σ σ σ σ σ2 2 = = ΣΣΣΣΣΣΣΣ(x(xii––x )x )2 2 / n/ nnnthen the jackknife estimator is sthen the jackknife estimator is s22 ss2 2 = = ΣΣΣΣΣΣΣΣ(x(xii––x )x )2 2 / (n / (n --1)1) Which is known to be unbiased for Which is known to be unbiased for σ σ22© Prentice Hall 5Review: DistanceReview: Distance--based Algorithmsbased AlgorithmsnnPlace items in class to which they are Place items in class to which they are ““closestclosest””..nnSimilarity measures or distance Similarity measures or distance measuresmeasuresnnSimple approachSimple approachnnK Nearest NeighborsK Nearest NeighborsnnDecision Tree issues, pros and consDecision Tree issues, pros and cons© Prentice Hall 6Classification Using Decision Classification Using Decision TreesTreesnnPartitioning based:Partitioning based:Divide search Divide search space into rectangular regions.space into rectangular regions.nnTuple placed into class based on the Tuple placed into class based on the region within which it falls.region within which it falls.nnDT approaches differ in how the tree is DT approaches differ in how the tree is built: built: DT InductionDT InductionnnInternal nodes associated with attribute Internal nodes associated with attribute and arcs with values for that attribute.and arcs with values for that attribute.nnAlgorithms: ID3, C4.5, CARTAlgorithms: ID3, C4.5, CART2© Prentice Hall 7Decision TreeDecision TreeGiven: Given: ––D = {tD = {t11, , ……, , ttnn} where } where ttii=<t=<ti1i1, , ……, , ttihih> > ––Database schema contains {ADatabase schema contains {A11, A, A22, , ……, A, Ahh}}––Classes C={CClasses C={C11, , ……., C., Cmm}}Decision or Classification TreeDecision or Classification Treeis is a tree a tree associated with D such thatassociated with D such that––Each internal node is labeled with attribute, AEach internal node is labeled with attribute, Aii––Each arc is labeled with predicate which can Each arc is labeled with predicate which can be applied to attribute at parentbe applied to attribute at parent––Each leaf node is labeled with a class, Each leaf node is labeled with a class, CCjj© Prentice Hall 8DT InductionDT Induction© Prentice Hall 9Decision Tree Induction is often based on Decision Tree Induction is often based on Information TheoryInformation TheorySoSo© Prentice Hall 10InformationInformation© Prentice Hall 11DT Induction DT Induction nnWhen all the marbles in the bowl are When all the marbles in the bowl are mixed up, little information is given. mixed up, little information is given. nnWhen the marbles in the bowl are all When the marbles in the bowl are all from one class and those in the other from one class and those in the other two classes are on either side, more two classes are on either side, more information is given.information is given.Use this approach with DT Induction !Use this approach with DT Induction !© Prentice Hall 12Information/EntropyInformation/EntropynnGiven probabilities pGiven probabilities p11, p, p22, .., , .., ppsswhose sum is 1, whose sum is 1, EntropyEntropyis is defined as:defined as:nnEntropy measures the amount of randomness or surprise or Entropy measures the amount of randomness or surprise or uncertainty. uncertainty. ––Its value is between 0 and 1.Its value is between 0 and 1.––Reaches the maximum when all the probabilities are the same.Reaches the maximum when all the probabilities are the same.nnGoal in classificationGoal in classification––no surpriseno surprise––entropy = 0entropy = 03© Prentice Hall 13EntropyEntropylog (1/p) H(p,1-p)© Prentice Hall 14ID3ID3nnCreates tree using information theory Creates tree using information theory concepts and tries to reduce expected concepts and tries to reduce expected number of comparison..number of comparison..nnID3 chooses split attribute with the highest ID3 chooses split attribute with the highest information gain:information gain:––Information gain: the difference between how Information gain: the difference between how much information is needed to make a correct much information is needed to make a correct classification before the split versus how much classification before the split versus how much information is needed after the split.information is needed after the split.© Prentice Hall 15Height Example DataHeight Example DataN a m e G e n d e r H e ig h t O u tp u t 1 O u t p u t2 K ris tin a F 1 .6 m S h o rt M e d iu

View Full Document


School:
Email:
New Password:
Confirm Password:

Mt Holyoke CS 341 - Decision tree algorithms

Sign up for free to view:

Please select your school