Data Mining (Extra Credits)Given Data (From Text Book)Transactional DatabaseMinimum Support CountFrequent one ItemsetsFrequent 2 ItemsetsContinued..Frequent 3 ItemsetsAll Frequent ItemsetsAssociation rulesAssociation Rules by Classification Selected RulesTest DataContinued..Continued..Slide Number 16Data Mining (Extra Credits)Data Mining (Extra Credits)Given Data Given Data (From Text Book)(From Text Book)y T100 - {M,O,N,K,E,Y}y T200 - {D,O,N,K,E,Y}y T300 - {M,A,K,E}y T400 - {M,U,C,K,Y}y T500 - {C,O,K,I,E}y Min_sup = 60% = Minimum Support Count(%)y Min_conf = 80% = Minimum Confidence(%)y Let the Classes be E (Item 11) and Y (Item 10).Transactional DatabaseTransactional DatabaseMItem 1DItem 2CItem 3OItem 4AItem 5UItem 6NItem 7KItem 8IItem 9YItem 10ClassEItem 11ClassT100+ - - + - - + + - + +T200- + - + - - + + - + +T300+ - - - + - - + - - +T400+ - + - - + - + - + -T500- - + + - - - + + - +Minimum Support CountMinimum Support County Total No. of Transaction = 5y Minimum Support Count = 60% of 5= 3Frequent one ItemsetsFrequent one ItemsetsOne item Itemsets M 3D 1C 2O 3A 1U 1N 2K 5E 4Y 3I 1Frequent One ItemsetsM 3O 3K 5E 4Y 3Minimum Support Count = 3Frequent 2 ItemsetsFrequent 2 ItemsetsCandidate 2- Itemsets{M,O} 1{M,K} 3{M,E} 2{M,Y} 2{O,K} 3{O,E} 3{O,Y} 2{K,E} 4{K,Y} 3{E,Y} 2Candidate 2-Itemsets After Pruning{M,O} 1{M,K} 3{M,E} 2{M,Y} 2{O,K} 3{O,E} 3{O,Y} 2{K,E} 4{K,Y} 3{E,Y} 2Use AprioriPrinciple for PruningContinued..Continued..Frequent 2- Itemsets{M,K} 3{O,K} 3{O,E} 3{K,E} 4{K,Y} 3Candidate 2-Itemsets After Pruning{M,O} 1{M,K} 3{M,E} 2{M,Y} 2{O,K} 3{O,E} 3{O,Y} 2{K,E} 4{K,Y} 3{E,Y} 2Minimum Support Count = 3Frequent 3 ItemsetsFrequent 3 ItemsetsCandidate 3- Itemsets{O,K,E} 3{K,E,Y} 2Use AprioriPrinciple for PruningCandidate 3-Itemsets After Pruning{O,K,E} 3Minimum Support Count = 3Frequent 3-Itemsets{O,K,E} 3-> There are no 4 ItemsetsAll Frequent ItemsetsAll Frequent Itemsetsy M-3y O-3y K-5y E-4y Y-3y {M,K} - 3y {O,K} - 3y {O,E} - 3y {K,E} - 4y {K,Y} - 3y {O,K,E} - 3Association rulesAssociation rulesy C(XÆY) = P(Y|X) = sup_count(XUY) /sup_count(X)We have included only 2 and 3 frequent item sets, because One Itemsets will not help us in making the association rules.y Let the Classes be Y (Item 10) and E (Item 11)y So, we are interested in finding the Rules of the form A -> Y (Item 10) and A -> E (Item 11)Association Rules by Classification Association Rules by Classification y Rule_No Rule Confidence Confidence(%)y R1 O Æ E3/4 75%y R2 K Æ E4/5 80%y R3 K Æ Y3/5 60%y R4 {O,K} Æ E 3/3 100%y Since our classes are E and Y, so with confidence of 80% the rule R3 cannot be included. So we reduce the confidence to 60% to include R3.Selected RulesSelected Rulesy Rule_No Rule [Actual Support Count, Actual Confidence]y R1 O Æ E [60%, 75%]y R2 K Æ E [80%, 80%]y R3 K Æ Y [60%, 60%]y R4 {O,K} Æ E [60%, 100%]Test DataTest DataMItem 1DItem 2CItem 3OItem 4AItem 5UItem 6NItem 7KItem 8IItem 9YItem 10ClassEItem 11ClassT100+ - - - + - - + - + -T200+ - - + - - + + + - +T300- - + + + - - - - + -T400+ - + + - + - - - - +T500- + + - - - - + + - +Continued..Continued..y T100 satisfies the rule:◦ K->Y [Success]y T200 satisfies the rule:◦ {O,K}->E [Success]y T300 satisfies the rule:◦ {C,O,A} ->Y [Failure]Continued..Continued..y T400 satisfies the rule:◦ O -> E [Success]y T500 satisfies the rule:◦ K -> E [Success]y Predictive Accuracy = 80%y Error Rate = 20%Thank
View Full Document