Unformatted text preview:

Slide 1Information Gain FormulaThe IDT ExampleThe IDT Example (II)Induction: Fielded ApplicationsClassifying Credit Card Applications (from (Aha, 1996))Reduce Process Delays of Rotogravure PrintersSlide 8When to Consider Decision TreesInduction: AdvantagesInduction: LimitationsInduction: DiscussionSources:–Chapter 3, Lenz et al Book: Case-based Reasoning Technology–www.aic.nrl.navy.mil/~aha/research/applications.htmlInformation Gain FormulaPatrons?noneX7(-),x11(-)someX1(+),x3(+),x6(+),x8(+)fullX4(+),x12(+), x2(-),x5(-),x9(-),x10(-)Gain(A) = I(p/(p+n),n/(p+n)) – Remainder(A)Reminder(A) = p(A,1) I(p1/(p1+ n1), n1/(p1+ n1)) + p(A,2) I(p2/(p2+ n2), n2/(p2+ n2)) + p(A,3) I(p3/(p3+ n3), n3/(p3+ n3))The standard Expected Value FormulaThe IDT ExamplePatrons?noneX7(-),x11(-)someX1(+),x3(+),x6(+),x8(+)fullX4(+),x12(+), x2(-),x5(-),x9(-),x10(-)Gain(Patrons) = 1 – ((2/12)I(0,1)+(4/12)I(1,0)+(6/12)I(2/6,4/6)) = 0.541The IDT Example (II)Type? frenchX1(+),x5(-)italianX6(+),x10(-)burgerX3(+),x12(+), x7(-),x9(-)X4(+),x12(+)x2(-),x11(-)thaiGain(Type) = 1 – ((2/12)I(1/2,1/2)+(2/12)I(1/2,1/2)+ (4/12)I(2/4,2/4)+(4/12)I(2/4,2/4)) = 0Thus Parents is a better choice than TypeInduction: Fielded Applications1. Westinghouse: Transforming uranium gas2. Hartford Steam Boiler: Transformer diagnosis3. Steel Works Jesenice: Oil/lubricant properties4. American Express UK: credit cards applicant5. Siemens (BMT): Equipment configuration6. USAF school: Thallium diagnosis7. Boeing (Gold-digger): Manufacturing flaws8. R.R. Donelly and Sons (APOS): Banding9. Enichem (Enigma): Trouble shooting motor pumps10. Palomar Observation (SKICAT): Astronomical cataloging11. Continuum (Shopping): WWW shopping…Classifying Credit Card Applications(from (Aha, 1996))Credit card applicationyes (10% of 104)Induced Rule SystemAccept?Accept?Borderline?Borderline?no•American Express UK•Problem: Expert accuracy was below average (48%)•Evaluation: system was iteratively refined with experts•18 attributes (age, years of residence, etc)•Improved accuracy: 75%+Reduce Process Delays of Rotogravure Printers•Problem: Bandwidth often appears on chrome cylinders causing a shutdown or costly replacement of cylinders.•Cause unknown•Use of inductive process to predict setting of control parameters (e.g., ink viscosity)•Rules were posted on shop floor•Gain: less downtime and lower replacement costsDeveloping Cycle of IDT Applications(Adapted from (Langley, 1995))Problem formulationData collectionInduction of Decision Trees/rulesEvaluation of DT/rulesFielding and acceptanceMaintenanceWhen to Consider Decision Trees•Examples describable by attribute-value pairs•Target function is discrete valued•Disjunctive hypothesis might be required•Possible noise in dataSome functions are not amenable to be represented with decision trees:Parity function (returns true if input has an even number of 1’s)Induction: Advantages•Building a decision tree is a straightforward process•The information gain measure is built on a sound basis•During consultation, only a few tests are necessary before a classification is obtained•For industrial applications, the consultation system can be delivered in a runtime systemInduction: Limitations•DTs are not incremental: cannot be modified in runtime•Consultation system is static•Handling of unknown values for attributes is problematic•The inductive approach cannot distinguish between various classes of users (e.g., experts vs non


View Full Document

LEHIGH CSE 335 - Discussion

Download Discussion
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Discussion and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Discussion 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?