Association RuleData MiningAssociation Rule MiningMarket Basket AnalysisFamous & Interesting FindingWhy beer and Diapers??Two Certainty IndicesExample: SupportExample: ConfidenceFull example from WikiReferencesAssociation RuleBy Kenneth LeungData MiningThe process of extracting valid, previously unknown, comprehensible, and actionable information from large databases, and using it to make crucial business decisions.Make decision based on previous experience or observationAssociation Rule MiningFormal: To find interesting associations and/or correlation relationships among large set of data items. Association rules show attribute value conditions that occur frequently together in a given dataset.Informal: “If – Then” relationship. If this happen, what is most likely to happen next.Obesity => DiabetesMarket Basket AnalysisA typical and widely-used example of association rule mining.Example:Data are collected using bar-code scanners in supermarkets. Each record will consist of all items in a single purchase transaction.Managers would be interested to know if certain groups of items are consistently purchased together.They could use this data for adjusting store layouts (placing items optimally with respect to each other), for cross-selling, for promotions, for catalog design and to identify customer segments based on buying patterns.Famous & Interesting FindingBeer & Diaper“A number of convenience store clerks noticed that men often bought beer at the same time they bought diapers. The store mined its receipts and proved the clerks' observations correct. So, the store began stocking diapers next to the beer coolers, and salesskyrocketed”Why beer and Diapers??Moms are stressed out by their naughty babies, and they need some beers for relief?Diapers boxes for putting oldbeer bottles. Very environmentalFriendly, and easy handling.Two Certainty IndicesDetermine whether a rule is goodSupport of AR: percentage of transactions that contain X and Y (X and Y are two items)Confidence of AR: Ratio of number of transactions that contain X and Y to the number that contain XThe higher, the more reliable.Example: SupportSupermarket has 100,000 transactions.2000/100,000 transactions include beer800/2000 transactions contain diapersSupport for the rule “beer->diapers” is 800 or 800/100,000 = 0.0008, or 0.8%Example: ConfidenceSupermarket has 100,000 transactions.2000/100,000 transactions include beer800/2000 transactions contain item diapersConfidence for the rule “beer->diapers” is 800/2000 = 0.4, or 40%Full example from Wiki1. {Cold, Raining} => No2. {Calm, Dry} => Yes3. {Dry} => No4. {Windy} => No1. {Cold, Raining} => NoSupport: 2/5 = 40%Confidence: 2/2 = 100%=> Good2. {Calm, Dry} => YesSupport: 2/5 = 40%Confidence: 2/2 = 100%=> Good3. {Dry} => NoSupport: 1/5 = 20%Confidence: 1/3 = 33.3%=> Bad4. {Windy} => NoSupport: 0/5 = 0%Confidence: 1/1 = 100%=>BadReferenceshttp://www.resample.com/xlminer/help/Assocrules/associationrules_intro.htmhttp://en.wikipedia.org/wiki/Association_rule_learningDr Sin-Min Lee’s lecture
View Full Document