LEHIGH CSE 335 - Induction of Decision Trees - D920213

Home> Schools> Lehigh University> Computer Science & Engineering (CSE) > CSE 335> Induction of Decision Trees

DOC PREVIEW

LEHIGH CSE 335 - Induction of Decision Trees

School name Lehigh University

Course Cse 335- Intellignt Decision Supprt Sys

Pages 28

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1Learning: The Big PictureExample, Training Sets in IDTInduction of Decision TreesExampleExample of a Decision TreeExample of a Decision Tree (II)Optimality Criterion for Decision TreesInductive LearningInductionInduction of Decision Trees: A Greedy AlgorithmIDT: ExampleIDT: Example (cont’d)IDT: Example (cont’d)IDT: Some IssuesHow Well does IDT works?How Well does IDT works? (II)Selection of a Good Attribute: Information Gain TheorySelection of a Good Attribute: Information Gain Theory (II)Lets Play Twenty QuestionsWhat is happening? (Constitutive Rules)Same Principle Operates for Online VersionSelection of a Good Attribute: Information Gain Theory (III)Selection of a Good AttributeSelection of a Good Attribute (II)Selection of a Good Attribute (III)Parts I and II Due: Monday!Construction of Optimal Decision Trees is NP-CompleteInduction of Decision Trees (IDT)CSE 335/435Resources:–Main: Artificial Intelligence: A Modern Approach (Russell and Norvig; Chapter “Learning from Examples”)–Alternatives:–http://www.dmi.unict.it/~apulvirenti/agd/Qui86.pdf–http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id3/id3.html–http://www.aaai.org/AITopics/html/expert.html (article: Think About It: Artificial Intelligence & Expert Systems)–http://www.aaai.org/AITopics/html/trees.htmlLearning: The Big Picture•Two forms of learning:Supervised: the input and output of the learning component can be perceivedExample: classification tasksUnsupervised: there is no hint about the correct answers of the learning componentExample: finding data clustersExample, Training Sets in IDT•Each row in the table (i.e., entry for the Boolean function) is an example•All rows form the training set•If the classification of the example is “yes”, we say that the example is positive, otherwise we say that the example is negative (this is called Boolean or Binary classification)•The algorithm we are going to present can be easily extended to non-Boolean classification problems• That is, problems in which there are 3 or more possible classes • Example of such problems?Induction of Decision Trees•Objective: find a concise decision tree that agrees with the examples• “concise” instead of “optimal” because the latter is NP-complete•The guiding principle we are going to use is the Ockham’s razor principle: the most likely hypothesis is the simplest one that is consistent with the examples• So we will perform heuristic approximations to the problem• Aiming at getting good solutions but not necessarily optimal•Sometimes the algorithm do generate optimal solutionsfor the simple restaurant example the algorithm does find an optimal solution)ExampleExample of a Decision TreeBar?yesyesHunyesPatfullAlt…yes…Possible Algorithm:1. Pick an attribute A randomly2. Make a child for every possible value of A3. Repeat 1 for every child until all attributes are exhausted4. Label the leaves according to the casesnoFrinoHunyesPatsomeAlt…yesProblem: Resulting tree could be very longExample of a Decision Tree (II)Patrons?no yesnonesomewaitEstimate?noyes0-10>60FullAlternate?Reservation?Yes30-60noyesNonoBar?YesnoyesFri/Sat?No YesyesnoyesHungry?yesNo10-30Alternate?yesYesnoRaining?noyesyesnoyesNice: Resulting tree is optimal.Optimality Criterion for Decision Trees•We want to reduce the average number of questions that are been asked. But how do we measure this for a tree T•How about using the height: T is better than T’ if height(T) < height(T’) Doesn’t work. Easy to show a counterexample, whereby height(T) = height(T’) but T asks less questions on average than T’ •Better: the average path lenght , APL(T), of the tree T. Let L1, ..,Ln be the n leaves of a decision tree T.APL(T) = (height(L1) + height(L2) +…+ height(Ln))/n •Optimality criterion: A decision tree T is optimal if (1) T has the lowest APL and (2) T is consistent with the input tableHomeworkInductive Learning•An example has the from (x,f(x))•Inductive task: Given a collection of examples, called the training set, for a function f, return a function h that approximates f (h is called the hypothesis)noise, error•There is no way to know which of these two hypothesis is a better approximation of f. A preference of one over the other is called a bias.InductionEx’ple Bar Fri Hun Pat Type Res wait x1 no no yes some french yes yes x4 no yes yes full thai no yes x5 no yes no full french yes no x6 x7 x8 x9 x10 x11DatapatternpatternDatabases: what are the data that matches this pattern?databaseInduction: what is the pattern that matches these data?inductionInduction of Decision Trees: A Greedy AlgorithmAlgorithm: 1. Initially all examples are in the same group2. Select the attribute that makes the most difference (i.e., for each of the values of the attribute most of the examples are either positive or negative)3. Group the examples according to each value for the selected attribute4. Repeat 1 within each group (recursive call)IDT: ExampleLets compare two candidate attributes: Patrons and Type. Which is a better attribute?Patrons?noneX7(-),x11(-)someX1(+),x3(+),x6(+),x8(+)fullX4(+),x12(+), x2(-),x5(-),x9(-),x10(-)Type? frenchX1(+),x5(-)italianX6(+),x10(-)burgerX3(+),x12(+), x7(-),x9(-)X4(+),x12(+)x2(-),x11(-)thaiIDT: Example (cont’d)We select a best candidate for discerning between X4(+),x12(+), x2(-),x5(-),x9(-),x10(-) Patrons?nonenosomeyesfullHungryno yesX5(-),x9(-)X4(+),x12(+),X2(-),x10(-)IDT: Example (cont’d)By continuing in the same manner we obtain: Patrons?nonenosomeyesfullHungryno yesYesType?YesnoFri/Sat?frenchitalianthaiburgeryesno yesno yesIDT: Some Issues•Sometimes we arrive to a node with no examples.This means that the example has not been observed.We just assigned as value the majority vote of its parent•Sometimes we arrive to a node with both positive and negative examples and no attributes left.This means that there is noise in the data.We again assigned as value the majority vote of the examplesHow Well does IDT works?This means: how well does H approximates f?Empirical evidence:1.Collect a large set of examples2.Divide it into two sets: the training set and the test set3.Measure percentage of examples in test set that are classified correctly4.Repeat 1 top 4 for different size of training sets, which are randomly selectedNext slide shows the sketch of a resulting

View Full Document