New version page

SMU CSE 8331 - Lecture Notes

This preview shows page 1-2-3 out of 8 pages.

View Full Document
View Full Document

End of preview. Want to read all 8 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

0018-9162/03/$17.00 © 2003 IEEE22 ComputerCOMPUTING PRACTICESPublished by the IEEE Computer SocietyData Mining for Very Busy PeopleFor 21st-century businesses, the problem isnot accessing data but ignoring irrelevantdata. Most modern businesses can elec-tronically access mountains of data suchas transactions for the past two years orthe state of their assembly line. The trick is effec-tively using the available data. In practice, this meanssummarizing large data sets to find the “pearls inthe dust”—that is, the data that really matters. In the data mining community, “learning the least”is an uncommon goal. Most data miners are zealoushunters seeking detailed summaries and generatingextensive and lengthy descriptions. The “DataMining and Treatment Learning” sidebar discussessome work in this area. Here, we take a differentapproach and assume that busy people don’t need—or can’t use—complex models. Rather, they wantonly the data they need to achieve the most benefits. Instead of finding extensive descriptions of things,the TAR2 “treatment learner” is a data mining tool(http://menzies.us/rx.html) that hunts for a minimaldifference set between things.1A list of essential dif-ferences is easier to read and understand thandetailed descriptions. Overly elaborate models cancomplicate, not clarify, a situation. Cognitive sci-entists and researchers studying human decisionmaking note that people often use simple modelsrather than intricate ones.2Because it learns smallermodels, TAR2 provides better support for real-world decision making than standard data miners.TAR2: A SIMPLER, SHORTER RULEFigure 1 shows a typical decision tree, generatedfrom the housing database at the University of Cali-fornia, Irvine Machine Learning Repository (www.ics.uci.edu/~mlearn/MLRepository.html). Eachbranch describes identifying factors for houses ofhigh, medium-high, medium-low, and low qualityusing seven attributes: • age: proportion of houses built before 1940•b: information on the suburb’s racial makeup• dis: weighted distances to five employment centers• lstat: living standard• nox: nitric oxides concentration• ptratio: parent-teacher ratio at local schools•rm:number of roomsTo compare Figure 1 to TAR2’s output, we firstconvert the tree to TAR2 rule format. We generatedthe tree using the Waikato Environment forKnowledge Analysis J4.8 algorithm (www.cs.waikato.ac.nz/~ml/). Although there are manyTim MenziesWest VirginiaUniversityYing HuUniversity of British Columbia To meet the needs of busy people whoonly want to know enough to achievethe most benefits, the TAR2 treatmentlearner generates easy-to-read andimmediately useful data mining rules.November 2003 23Data mining uses techniques from sta-tistics and artificial intelligence to reducelarge sets of examples to small, under-standable patterns. Decision tree learningMany data mining methods generatedecision trees—trees whose leaves are clas-sifications and whose branches are con-junctions of features that lead to those clas-sifications. One way to learn a decisiontree is to split the example set into subsetsbased on some attribute value test. Theprocess then repeats recursively on the sub-sets, with each splitter value becoming asubtree root. Splitting stops when a subsetgets so small that further splitting is super-fluous or a subset contains examples withonly one classification.A good split decreases the percentage ofdifferent classifications in a subset, ensur-ing that subsequent learning generatessmaller subtrees by requiring less furthersplitting to sort out the subsets. Variousschemes for finding good splits exist.1,2Association rule learning Association rule learners such as Apri-ori3find attributes commonly occurringtogether in a training set. No attribute canappear on both sides of the associationLHS → RHS—that is, LHS ∩ RHS = 0/. The rule LHS → RHS holds in the ex-ample set with confidence c if c percentof the examples containing LHS also con-tain RHS: c = |LHS ∪ RHS| × 100/|LHS|.The rule LHS → RHS has support s in theexample set if s percent of the examplescontain LHS ∪ RHS: s = |LHS ∪ RHS| ×100/|D|, where |D| is the number of exam-ples. Association rule learners return ruleswith high confidence (for example, c > 90percent). Rejecting associations with low sup-port first can cull the search for associa-tions. We can view association rule learn-ers as generalizations of decision treelearning: Decision tree learners restrictthe RHS of rules to one class attribute,whereas association rule learners can addany number of attributes to the RHS. Association rule learning variantsContrast set learning is a variant ofassociation rule learning. Instead of find-ing rules that describe the current situa-tion, contrast set learners like Stucco4findrules that differ meaningfully in their dis-tribution across groups. For example, inStucco, an analyst could ask, “What arethe differences between people with PhDand bachelor’s degrees?”Weighted class learning is another vari-ant. Association rule learners such as Min-wal5assign weights to classes to focus thelearning on issues of interest to a particu-lar audience. TAR2 is a weighted contrastset learner that finds rules associatingattribute values with changes to the classdistributions. TAR2’s design is simpler thanmany other learners because it assumes thesmall treatment effect—that is, treatmentstypically use only a few attributes.Other machine learning researchershave also discovered that schemes usingonly a subset of the available attributescan generate effective theories. For exam-ple, learners using many attributes per-formed only moderately better thanRobert Holte’s 1R machine learner, whichwas restricted to a single attribute.6TAR2 does not use the 1R techniquebecause our results show that the best treat-ments can require more than one attribute.WrappersRon Kohavi and George John wrappedtheir learners in a preprocessor that useda heuristic search to grow subsets of theavailable attributes from size 1.7At eachstep, the wrapper called a learner to findthe accuracy of the model learned fromthe current subset. Subset growth stoppedwhen adding new attributes didn’timprove accuracy. On average, theirexperiments showed that up to 83 percentof a domain’s attributes could be ignoredwith only a minimal loss of accuracy. TAR2 does not use this techniquebecause using wrappers to select relevantfeatures can be prohibitively slow as


View Full Document
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?