SBU CSE 634 - Mining association rules in large databases

Unformatted text preview:

CSE 634 Data Mining TechniquesReferencesOverviewBasic Concepts of Association Rule MiningAssociation Model: Problem StatementRule Measures: Support & ConfidenceSupport & Confidence : An ExampleTypes of Association Rule MiningSlide 9The Apriori Algorithm: BasicsThe Apriori Algorithm in a NutshellThe Apriori Algorithm : Pseudo codeThe Apriori Algorithm: ExampleStep 1: Generating 1-itemset Frequent PatternStep 2: Generating 2-itemset Frequent PatternStep 2: Generating 2-itemset Frequent Pattern [Cont.]Step 3: Generating 3-itemset Frequent PatternStep 3: Generating 3-itemset Frequent Pattern [Cont.]Step 4: Generating 4-itemset Frequent PatternStep 5: Generating Association Rules from Frequent ItemsetsStep 5: Generating Association Rules from Frequent Itemsets [Cont.]Slide 22Slide 23Methods to Improve Apriori’s EfficiencySlide 25Mining Frequent Patterns Without Candidate GenerationFP-Growth Method : An ExampleFP-Growth Method: Construction of FP-TreeSlide 29Mining the FP-Tree by Creating Conditional (sub) pattern basesFP-Tree Example ContinuedSlide 32Why Frequent Pattern Growth Fast ?Slide 34Association & CorrelationCorrelation ConceptsCorrelation Concepts [Cont.]Slide 38Correlation RulesSummaryQuestions ?CSE 634Data Mining TechniquesMining Association Rules in Large DatabasesPrateek Duble(105301354)Course Instructor: Prof. Anita WasilewskaState University of New York, Stony BrookState University of New York, Stony Brook 2ReferencesData Mining: Concepts & Techniques by Jiawei Han and Micheline KamberPresentation Slides of Prof. Anita WasilewskaPresentation Slides of the Course Book.“An Effective Hah Based Algorithm for Mining Association Rules” (Apriori Algorithm) by J.S. Park, M.S. Chen & P.S.Yu , SIGMOD Conference, 1995. “Mining Frequent Patterns without candidate generation” (FP-Tree Method) by J. Han, J. Pei , Y. Yin & R. Mao , SIGMOD Conference, 2000.State University of New York, Stony Brook 3OverviewBasic Concepts of Association Rule MiningThe Apriori Algorithm (Mining single dimensional boolean association rules)Methods to Improve Apriori’s EfficiencyFrequent-Pattern Growth (FP-Growth) MethodFrom Association Analysis to Correlation AnalysisSummaryState University of New York, Stony Brook 4Basic Concepts of Association Rule MiningAssociation Rule MiningFinding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.ApplicationsBasket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.ExamplesRule form: “Body  Head [support, confidence]”.buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%]State University of New York, Stony Brook 5Association Model: Problem StatementI ={i1, i2, ...., in} a set of items J = P(I ) set of all subsets of the set of items, elements of J are called itemsetsTransaction T: T is subset of IData Base: set of transactionsAn association rule is an implication of the form : X-> Y, where X, Y are disjoint subsets of I (elements of J )Problem: Find rules that have support and confidence greater that user-specified minimum support and minimun confidenceState University of New York, Stony Brook 6Rule Measures: Support & Confidence Simple Formulas:Confidence (AB) = #tuples containing both A & B / #tuples containing A = P(B|A) = P(A U B ) / P (A) Support (AB) = #tuples containing both A & B/ total number of tuples = P(A U B)  What do they actually mean ?Find all the rules X & Y  Z with minimum confidence and supportsupport, s, probability that a transaction contains {X, Y, Z}confidence, c, conditional probability that a transaction having {X, Y} also contains ZState University of New York, Stony Brook 7Support & Confidence : An ExampleLet minimum support 50%, and minimum confidence 50%, then we have,A  C (50%, 66.6%)C  A (50%, 100%)State University of New York, Stony Brook 8Types of Association Rule MiningBoolean vs. quantitative associations (Based on the types of values handled)buys(x, “SQLServer”) ^ income(x, “DMBook”)  buys(x, “DBMiner”) [0.2%, 60%]age(x, “30..39”) ^ income(x, “42..48K”) buys(x, “PC”) [1%, 75%]Single dimension vs. multiple dimensional associations (see ex. Above)Single level vs. multiple-level analysisWhat brands of beers are associated with what brands of diapers?Various extensionsCorrelation, causality analysisAssociation does not necessarily imply correlation or causalityConstraints enforcedE.g., small sales (sum < 100) trigger big buys (sum > 1,000)?State University of New York, Stony Brook 9OverviewBasic Concepts of Association Rule MiningThe Apriori Algorithm (Mining single dimensional boolean association rules)Methods to Improve Apriori’s EfficiencyFrequent-Pattern Growth (FP-Growth) MethodFrom Association Analysis to Correlation AnalysisSummaryState University of New York, Stony Brook 10The Apriori Algorithm: BasicsThe Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules.Key Concepts :•Frequent Itemsets: The sets of item which has minimum support (denoted by Li for ith-Itemset).•Apriori Property: Any subset of frequent itemset must be frequent.•Join Operation: To find Lk , a set of candidate k-itemsets is generated by joining Lk-1 with itself.State University of New York, Stony Brook 11The Apriori Algorithm in a NutshellFind the frequent itemsets: the sets of items that have minimum supportA subset of a frequent itemset must also be a frequent itemseti.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemsetIteratively find frequent itemsets with cardinality from 1 to k (k-itemset)Use the frequent itemsets to generate association rules.State University of New York, Stony Brook 12The Apriori Algorithm : Pseudo codeJoin Step: Ck is generated by joining Lk-1with itselfPrune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemsetPseudo-code:Ck: Candidate itemset of size kLk : frequent itemset of size kL1 = {frequent items};for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do


View Full Document

SBU CSE 634 - Mining association rules in large databases

Download Mining association rules in large databases
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Mining association rules in large databases and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Mining association rules in large databases 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?