APRIORI AlgorithmThe Apriori Algorithm: BasicsThe Apriori Algorithm in a NutshellThe Apriori Algorithm : Pseudo codeThe Apriori Algorithm: ExampleStep 1: Generating 1-itemset Frequent PatternStep 2: Generating 2-itemset Frequent PatternStep 2: Generating 2-itemset Frequent PatternStep 3: Generating 3-itemset Frequent PatternStep 3: Generating 3-itemset Frequent PatternStep 4: Generating 4-itemset Frequent PatternStep 5: Generating Association Rules from Frequent ItemsetsStep 5: Generating Association Rules from Frequent ItemsetsStep 5: Generating Association Rules from Frequent ItemsetsMethods to Improve Apriori’s EfficiencyMining Frequent Patterns Without Candidate GenerationFP-Growth Method : An ExampleFP-Growth Method: Construction of FP-TreeFP-Growth Method: Construction of FP-TreeMining the FP-Tree by Creating Conditional (sub) pattern basesFP-Tree Example ContinuedFP-Tree Example ContinuedWhy Frequent Pattern Growth Fast ?APRIORI AlgorithmProfessor Anita WasilewskaBook slidesThe Apriori Algorithm: BasicsThe Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules.Key Concepts :• Frequent Itemsets: The sets of item which has minimum support (denoted by Li for ith-Itemset).• Apriori Property: Any subset of frequent itemset must be frequent.• Join Operation: To find Lk , a set of candidate k-itemsets is generated by joining Lk-1 with itself.The Apriori Algorithm in a Nutshell• Find the frequent itemsets: the sets of items that have minimum support– A subset of a frequent itemset must also be a frequent itemset• i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset– Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)• Use the frequent itemsets to generate association rules.The Apriori Algorithm : Pseudo code• Join Step: Ck is generated by joining Lk-1 with itself• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset• Pseudo-code:Ck : Candidate itemset of size kLk : frequent itemset of size kL1 = {frequent items};for (k = 1; Lk !=∅; k++) do beginCk+1 = candidates generated from Lk ;for each transaction t in database doincrement the count of all candidates in Ck+1that are contained in tLk+1 = candidates in Ck+1 with min_supportendreturn ∪k Lk ;The Apriori Algorithm: Example• Consider a database, D , consisting of 9 transactions.• Suppose min. support count required is 2 (i.e. min_sup = 2/9 = 22 % )• Let minimum confidence required is 70%.• We have to first find out the frequent itemset using Apriori algorithm.• Then, Association rules will be generated using min. support & min. confidence.TID List of ItemsT100 I1, I2, I5T100 I2, I4T100 I2, I3T100 I1, I2, I4T100 I1, I3T100 I2, I3T100 I1, I3T100 I1, I2 ,I3, I5T100 I1, I2, I3Step 1: Generating 1-itemset Frequent PatternItemset Sup.Count{I1} 6{I2} 7{I3} 6{I4} 2{I5} 2Itemset Sup.Count{I1} 6{I2} 7{I3} 6{I4} 2{I5} 2• The set of frequent 1-itemsets, L1 , consists of the candidate 1- itemsets satisfying minimum support.• In the first iteration of the algorithm, each item is a member of the set of candidate.Scan D for count of each candidateCompare candidate support count with minimum support countC1L1Step 2: Generating 2-itemset Frequent Pattern• To discover the set of frequent 2-itemsets, L2 , the algorithm uses L1 Join L1 to generate a candidate set of 2-itemsets, C2 .• Next, the transactions in D are scanned and the support count for each candidate itemset in C2 is accumulated (as shown in the middle table).• The set of frequent 2-itemsets, L2 , is then determined, consisting of those candidate 2-itemsets in C2 having minimum support.• Note: We haven’t used Apriori Property yet.Step 2: Generating 2-itemset Frequent PatternItemset{I1, I2}{I1, I3}{I1, I4}{I1, I5}{I2, I3}{I2, I4}{I2, I5}{I3, I4}{I3, I5}{I4, I5}Itemset Sup.Count{I1, I2} 4{I1, I3} 4{I1, I4} 1{I1, I5} 2{I2, I3} 4{I2, I4} 2{I2, I5} 2{I3, I4} 0{I3, I5} 1{I4, I5} 0Itemset SupCount{I1, I2} 4{I1, I3} 4{I1, I5} 2{I2, I3} 4{I2, I4} 2{I2, I5} 2Generate C2 candidates from L1C2C2L2Scan D for count of each candidateCompare candidate support count with minimum support countStep 3: Generating 3-itemset Frequent PatternItemset{I1, I2, I3}{I1, I2, I5}Itemset Sup.Count{I1, I2, I3} 2{I1, I2, I5} 2Itemset SupCount{I1, I2, I3} 2{I1, I2, I5} 2C3C3L3Scan D for count of each candidateCompare candidate support count with min support countScan D for count of each candidate• The generation of the set of candidate 3-itemsets, C3 , involves use of the Apriori Property.• In order to find C3 , we compute L2 Join L2 .• C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.• Now, Join step is complete and Prune step will be used to reduce the size of C3 . Prune step helps to avoid heavy computation due to large Ck .Step 3: Generating 3-itemset Frequent Pattern• Based on the Apriori property that all subsets of a frequent itemset must also be frequent, we can determine that four latter candidates cannot possibly be frequent. How ?• For example , lets take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L2 , We will keep {I1, I2, I3} in C3 .• Lets take another example of {I2, I3, I5} which shows how the pruning is performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}. • BUT, {I3, I5} is not a member of L2 and hence it is not frequent violating Apriori Property. Thus We will have to remove {I2, I3, I5} from C3 .• Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of result of Join operation for Pruning.• Now, the transactions in D are scanned in order to determine L3 , consisting of those candidates 3-itemsets in C3 having minimum support.Step 4: Generating 4-itemset Frequent Pattern• The algorithm uses L3 Join L3 to generate a candidate set of 4-itemsets, C4 . Although the join results in {{I1, I2, I3, I5}}, this itemset is pruned since its subset {{I2, I3, I5}} is not frequent. • Thus, C4 = φ , and algorithm terminates, having found all of the frequent items. This completes our Apriori Algorithm.• What’s Next ? These frequent itemsets will be used to generate strong association rules ( where strong association rules satisfy both minimum support & minimum confidence).Step 5: Generating Association Rules from Frequent
View Full Document