SBU CSE 634 - Mining association rules in large databases - D2508734

Home> Schools> Stony Brook University> Computer Science (CSE) > CSE 634> Mining association rules in large databases

SBU CSE 634 - Mining association rules in large databases

Course Cse 634- Data Mining Concepts And Techn

Pages 41

Download Save

Unformatted text preview:

CSE 634 Data Mining TechniquesReferencesOverviewBasic Concepts of Association Rule MiningAssociation Model: Problem StatementRule Measures: Support & ConfidenceSupport & Confidence : An ExampleTypes of Association Rule MiningSlide 9The Apriori Algorithm: BasicsThe Apriori Algorithm in a NutshellThe Apriori Algorithm : Pseudo codeThe Apriori Algorithm: ExampleStep 1: Generating 1-itemset Frequent PatternStep 2: Generating 2-itemset Frequent PatternStep 2: Generating 2-itemset Frequent Pattern [Cont.]Step 3: Generating 3-itemset Frequent PatternStep 3: Generating 3-itemset Frequent Pattern [Cont.]Step 4: Generating 4-itemset Frequent PatternStep 5: Generating Association Rules from Frequent ItemsetsStep 5: Generating Association Rules from Frequent Itemsets [Cont.]Slide 22Slide 23Methods to Improve Apriori’s EfficiencySlide 25Mining Frequent Patterns Without Candidate GenerationFP-Growth Method : An ExampleFP-Growth Method: Construction of FP-TreeSlide 29Mining the FP-Tree by Creating Conditional (sub) pattern basesFP-Tree Example ContinuedSlide 32Why Frequent Pattern Growth Fast ?Slide 34Association & CorrelationCorrelation ConceptsCorrelation Concepts [Cont.]Slide 38Correlation RulesSummaryQuestions ?CSE 634Data Mining TechniquesMining Association Rules in Large DatabasesPrateek Duble(105301354)Course Instructor: Prof. Anita WasilewskaState University of New York, Stony BrookState University of New York, Stony Brook 2ReferencesData Mining: Concepts & Techniques by Jiawei Han and Micheline KamberPresentation Slides of Prof. Anita WasilewskaPresentation Slides of the Course Book.“An Effective Hah Based Algorithm for Mining Association Rules” (Apriori Algorithm) by J.S. Park, M.S. Chen & P.S.Yu , SIGMOD Conference, 1995. “Mining Frequent Patterns without candidate generation” (FP-Tree Method) by J. Han, J. Pei , Y. Yin & R. Mao , SIGMOD Conference, 2000.State University of New York, Stony Brook 3OverviewBasic Concepts of Association Rule MiningThe Apriori Algorithm (Mining single dimensional boolean association rules)Methods to Improve Apriori’s EfficiencyFrequent-Pattern Growth (FP-Growth) MethodFrom Association Analysis to Correlation AnalysisSummaryState University of New York, Stony Brook 4Basic Concepts of Association Rule MiningAssociation Rule MiningFinding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.ApplicationsBasket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.ExamplesRule form: “Body  Head [support, confidence]”.buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%]State University of New York, Stony Brook 5Association Model: Problem StatementI ={i1, i2, ...., in} a set of items J = P(I ) set of all subsets of the set of items, elements of J are called itemsetsTransaction T: T is subset of IData Base: set of transactionsAn association rule is an implication of the form : X-> Y, where X, Y are disjoint subsets of I (elements of J )Problem: Find rules that have support and confidence greater that user-specified minimum support and minimun confidenceState University of New York, Stony Brook 6Rule Measures: Support & Confidence Simple Formulas:Confidence (AB) = #tuples containing both A & B / #tuples containing A = P(B|A) = P(A U B ) / P (A) Support (AB) = #tuples containing both A & B/ total number of tuples = P(A U B)  What do they actually mean ?Find all the rules X & Y  Z with minimum confidence and supportsupport, s, probability that a transaction contains {X, Y, Z}confidence, c, conditional probability that a transaction having {X, Y} also contains ZState University of New York, Stony Brook 7Support & Confidence : An ExampleLet minimum support 50%, and minimum confidence 50%, then we have,A  C (50%, 66.6%)C  A (50%, 100%)State University of New York, Stony Brook 8Types of Association Rule MiningBoolean vs. quantitative associations (Based on the types of values handled)buys(x, “SQLServer”) ^ income(x, “DMBook”)  buys(x, “DBMiner”) [0.2%, 60%]age(x, “30..39”) ^ income(x, “42..48K”) buys(x, “PC”) [1%, 75%]Single dimension vs. multiple dimensional associations (see ex. Above)Single level vs. multiple-level analysisWhat brands of beers are associated with what brands of diapers?Various extensionsCorrelation, causality analysisAssociation does not necessarily imply correlation or causalityConstraints enforcedE.g., small sales (sum < 100) trigger big buys (sum > 1,000)?State University of New York, Stony Brook 9OverviewBasic Concepts of Association Rule MiningThe Apriori Algorithm (Mining single dimensional boolean association rules)Methods to Improve Apriori’s EfficiencyFrequent-Pattern Growth (FP-Growth) MethodFrom Association Analysis to Correlation AnalysisSummaryState University of New York, Stony Brook 10The Apriori Algorithm: BasicsThe Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules.Key Concepts :•Frequent Itemsets: The sets of item which has minimum support (denoted by Li for ith-Itemset).•Apriori Property: Any subset of frequent itemset must be frequent.•Join Operation: To find Lk , a set of candidate k-itemsets is generated by joining Lk-1 with itself.State University of New York, Stony Brook 11The Apriori Algorithm in a NutshellFind the frequent itemsets: the sets of items that have minimum supportA subset of a frequent itemset must also be a frequent itemseti.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemsetIteratively find frequent itemsets with cardinality from 1 to k (k-itemset)Use the frequent itemsets to generate association rules.State University of New York, Stony Brook 12The Apriori Algorithm : Pseudo codeJoin Step: Ck is generated by joining Lk-1with itselfPrune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemsetPseudo-code:Ck: Candidate itemset of size kLk : frequent itemset of size kL1 = {frequent items};for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do

View Full Document


School:
Email:
New Password:
Confirm Password:

SBU CSE 634 - Mining association rules in large databases

Sign up for free to view:

Please select your school