DOC PREVIEW
BU CS 565 - Mining Association Rules in Large Databases

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Mining Association Rules in Large DatabasesAssociation rules • Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other items in the transactionMarket-Basket transactionsTID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Examples of association rules{Diaper} {Beer},{Milk, Bread} {Diaper,Coke},{Beer, Bread} {Milk},An even simpler concept: frequent itemsets• Given a set of transactions D, find combination of items that occur frequentlyMarket-Basket transactionsTID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Examples of frequent itemsets{Diaper, Beer},{Milk, Bread} {Beer, Bread, Milk},Lecture outline• Task 1: Methods for finding all frequent itemsets efficiently • Task 2: Methods for finding association rules efficientlyDefinition: Frequent Itemset• Itemset– A set of one or more items• E.g.: {Milk, Bread, Diaper}– k-itemset• An itemset that contains k items• Support count ( )– Frequency of occurrence of an itemset(number of transactions it appears)– E.g. ({Milk, Bread,Diaper}) = 2 • Support– Fraction of the transactions in which an itemset appears– E.g. s({Milk, Bread, Diaper}) = 2/5• Frequent Itemset– An itemset whose support is greater than or equal to a minsup thresholdTID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, CokeWhy do we want to find frequent itemsets?• Find all combinations of items that occur together• They might be interesting (e.g., in placement of items in a store )• Frequent itemsets are only positive combinations (we do not report combinations that do not occur frequently together)• Frequent itemsets aims at providing a summary for the dataFinding frequent sets• Task: Given a transaction database D and a minsup threshold find all frequent itemsets and the frequency of each set in this collection• Stated differently: Count the number of times combinations of attributes occur in the data. If the count of a combination is above minsup report it.• Recall: The input is a transaction database D where every transaction consists of a subset of items from some universe IHow many itemsets are there? nullAB AC AD AE BC BD BE CD CE DEA B C D EABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABCD ABCE ABDE ACDE BCDEABCDEGiven d items, there are 2dpossible itemsetsWhen is the task sensible and feasible?• If minsup = 0, then all subsets of I will be frequent and thus the size of the collection will be very large• This summary is very large (maybe larger than the original input) and thus not interesting• The task of finding all frequent sets is interesting typically only for relatively large values of minsupA simple algorithm for finding all frequent itemsets ??nullAB AC AD AE BC BD BE CD CE DEA B C D EABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABCD ABCE ABDE ACDE BCDEABCDEBrute-force algorithm for finding all frequent itemsets?• Generate all possible itemsets (lattice of itemsets)– Start with 1-itemsets, 2-itemsets,...,d-itemsets• Compute the frequency of each itemset from the data– Count in how many transactions each itemset occurs• If the support of an itemset is above minsup report it as a frequent itemsetBrute-force approach for finding all frequent itemsets• Complexity?– Match every candidate against each transaction – For M candidates and N transactions, the complexity is~ O(NMw) => Expensive since M = 2d!!!Speeding-up the brute-force algorithm• Reduce the number of candidates (M)– Complete search: M=2d– Use pruning techniques to reduce M• Reduce the number of transactions (N)– Reduce size of N as the size of itemset increases– Use vertical-partitioning of the data to apply the mining algorithms• Reduce the number of comparisons (NM)– Use efficient data structures to store the candidates or transactions– No need to match every candidate against every transactionReduce the number of candidates• Apriori principle (Main observation):– If an itemset is frequent, then all of its subsets must also be frequent• Apriori principle holds due to the following property of the support measure:– The support of an itemset never exceeds the support of its subsets– This is known as the anti-monotone property of support)()()(:, YsXsYXYXExampleTID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke s(Bread) > s(Bread, Beer)s(Milk) > s(Bread, Milk)s(Diaper, Beer) > s(Diaper, Beer, Coke)Found to be InfrequentnullAB AC AD AE BC BD BE CD CE DEA B C D EABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABCD ABCE ABDE ACDE BCDEABCDEIllustrating the Apriori principlenullAB AC AD AE BC BD BE CD CE DEA B C D EABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABCD ABCE ABDE ACDE BCDEABCDEPruned supersetsIllustrating the Apriori principleItem CountBread 4Coke 2Milk 4Beer 3Diaper 4Eggs 1Itemset Count{Bread,Milk} 3{Bread,Beer} 2{Bread,Diaper} 3{Milk,Beer} 2{Milk,Diaper} 3{Beer,Diaper} 3Itemset Count {Bread,Milk,Diaper} 3 Items (1-itemsets)Pairs (2-itemsets)(No need to generatecandidates involving Cokeor Eggs)Triplets (3-itemsets)minsup = 3/5If every subset is considered, 6C1+ 6C2+ 6C3= 41With support-based pruning,6 + 6 + 1 = 13Exploiting the Apriori principle1. Find frequent 1-items and put them to Lk(k=1)2. Use Lkto generate a collection of candidate itemsets Ck+1with size (k+1)3. Scan the database to find which itemsets in Ck+1are frequent and put them into Lk+14. If Lk+1is not empty k=k+1 Goto step 2R. Agrawal, R. Srikant: "Fast Algorithms for Mining Association Rules", Proc. of the 20th Int'l Conference on Very Large Databases, 1994.The Apriori algorithmCk: Candidate itemsets of size kLk: frequent itemsets of size kL1= {frequent 1-itemsets};for (k = 2; Lk!= ; k++) Ck+1= GenerateCandidates(Lk)for each transaction t in database do increment count of candidates in Ck+1that are contained in tendforLk+1= candidates in Ck+1with support ≥min_supendforreturnkLk;GenerateCandidates• Assume the items in Lkare listed in an order (e.g., alphabetical)• Step 1: self-joining Lk(IN SQL)insert into Ck+1select p.item1, p.item2, …, p.itemk, q.itemkfrom Lkp, Lkqwhere p.item1=q.item1, …, p.itemk-1=q.itemk-1,


View Full Document

BU CS 565 - Mining Association Rules in Large Databases

Documents in this Course
Load more
Download Mining Association Rules in Large Databases
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Mining Association Rules in Large Databases and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Mining Association Rules in Large Databases 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?