Unformatted text preview:

Slide 1Original DataConverted DataGenerating 1-itemset Frequent PatternGenerating 2-itemset Frequent PatternJoining and PruningGenerating 3-itemset Frequent PatternGenerating Association Rules by ClassificationGenerating Association Rules by ClassificationGenerating Association Rules by ClassificationList of Selected Rules by ClassificationTest DataTest DataCSE 634/590 Data miningExtra Credit:Submitted By:Moieed Ahmed 106867769Student Grade Income BuysCS High Low MilkCS High High BreadMath Low Low BreadCS Medium High MilkMath Low Low BreadOriginal DataStudent= CS(I1)Student=math (I2)Grade= high (I3)Grade=medium(I4)Grade=low(I5)Income=high(I6)Income=low(I7)Buys=milk(I8)Buys=bread(I9)+ - + - - - + + -+ - + - - + - - +- + - - + - + - ++ - - + - + - + -- + - - + - + - +Converted DataGenerating 1-itemset Frequent PatternItem SetSupport Count{I1} 3{I2} 2{I3} 2{I4} 1{I5} 2{I6,} 2{I7} 3{I8} 2{I9} 3Item SetSupport Count{I1} 3{I2} 2{I3} 2{I5} 2{I6,} 2{I7} 3{I8} 2{I9} 3Scan D for count of each candidateCompare candidate support count with minimum support countC1L1Let, the minimum support count be 2. Since we have 5 records => (Support) = 2/5 = 40%Let, minimum confidence required is 70%.Item Set{I1,I2}{I1,I3}{I1,I4}{I1,I5}{I1,I6}{I1,I7}{I1,I8}{I1,I9}{I2,I3}{I2,I4}{I2,I5}{I2,I6}{I2,I7}{I2,I8}{I2,I9}{I3,I4}{I3,I5}{I3,I6}{I3,I7}{I3,I8}{I3,I9}{I4,I5}{I4,I6}{I4,I7}{I4,I8}{I4,I9}{I5,I6}{I5,I7}{I5,I8}{I5,I9}{I6,I7}{I6,I8}{I6,I9}{I7,I8}{I7,I9}{I8,I9}Generating 2-itemset Frequent Pattern Item Set Support Count{I1,I2} 0{I1,I3} 2{I1,I4} 1{I1,I5} 0{I1,I6} 2{I1,I7} 1{I1,I8} 2{I1,I9} 1{I2,I3} 0{I2,I4} 0{I2,I5} 2{I2,I6} 0{I2,I7} 2{I2,I8} 0{I2,I9} 2{I3,I4} 0{I3,I5} 0{I3,I6} 1{I3,I7} 1{I3,I8} 1{I3,I9} 1{I4,I5} 0{I4,I6} 1{I4,I7} 0{I4,I8} 1{I4,I9} 0{I5,I6} 0{I5,I7} 2{I5,I8} 0{I5,I9} 2{I6,I7} 0{I6,I8} 1{I6,I9} 0{I7,I8} 1{I7,I9} 2{I8,I9} 0Item Set Support Count{I1,I3} 2{I1,I6} 2{I1,I8} 2{I2,I5} 2{I2,I7} 2{I2,I9} 2{I5,I7} 2{I5,I9} 2{I7,I9} 2C2 C2L2Generate C2 candidates from L1Scan D for count of each candidate Compare candidate support count with minimum support count1. The join step: To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 with itself. This set of candidates is denoted Ck. Lk – Itemsets Ck – CandidatesConsidering {I2,I5} , {I7,I9} from L2 to arrive at L3 we Join L2xL2And thus we have {I2,I5,I7} , {I2,I5,I9} in the resultant candidates generated from L2Considering {I1,I3} , {I1,I6} from L2 we generate candidates {I1,I3,I6}2. The prune step:Ck is a superset of Lk, that is, its members may or may not be frequent All candidates having a count no less than the minimum support count are frequent by definition, and therefore belong to Lk). Ck, however, can be hugeThus, {I2,I5,I7}, {I2,I5,I9} from join step are considered since they have minimum support but {I1,I3,I6} is discarded since it does not meet the support count needed.Joining and PruningItem Set{I2,I5,I7}{I2,I5,I9}{I2,I7,I9}{I5,I7,I9}Generating 3-itemset Frequent PatternGenerate C3 candidates from L2Scan D for count of each candidate Item SetSupport Count{I2,I5,I7}2{I2,I5,I9}2{I2,I7,I9}2{I5,I7,I9}2Compare candidate support count with minimum support countItem SetSupport Count{I2,I5,I7}2{I2,I5,I9}2{I2,I7,I9}2{I5,I7,I9}2C3L3Generating 4-itemset Frequent PatternGenerate C4 candidates from L3Item Set{I2,I5,I7,I9}C3C4Scan D for count of each candidate Item SetSupport Count{I2,I5,I7,I9}2C4Compare candidate support count with minimum support countItem SetSupport Count{I2,I5,I7,I9}2L4oWhen mining association rules for use in classification, we are only interested in association rules of the formo(p1 ^ p2 ^: : : pl )  Aclass = C where the rule antecedent is a conjunction of items, p1, p2, : : : , pl (l n), associated with a class label, C.In our example Aclass would be either ( I8 or I9 on RHS) that is to predict whether a student with given characteristics buys Milk / Bread.Let, minimum confidence required be 70%Considering, l={I2,I5,I7,I9}It’s nonempty subsets are {{2},{5},{7},{9},{2,5},{2,7},{2,9},{5,7}, {5,9},{7,9},{2,5,7},{2,5,9},{2,7,9},{5,7,9}}Generating Association Rules by ClassificationR1 : I2 ^ I5 ^ I7  I9 [40%,100%]◦Confidence = sc{I2,I5,I7,I9}/ sc{I2,I5,I7} = 2/2 = 100%◦R1 is SelectedConsidering 3 itemset Frequent PatternR2 : I5 ^ I7  I9 [40%,100%]◦Confidence = sc{I5,I7,I9}/ sc{I5,I7} = 2/2 = 100%◦R2 is SelectedR3 : I2 ^ I7  I9 [40%,100%] ◦Confidence = sc{I2,I7,I9}/ sc{I2,I7} = 2/2 = 100%◦R3 is SelectedR4 : I2 ^ I5  I9 [40%,100%] ◦Confidence = sc{I2,I7,I9}/ sc{I2,I7} = 2/2 = 100%◦R4 is SelectedGenerating Association Rules by ClassificationConsidering 2 itemset Frequent PatternR5 : I5  I9 [40%,100%]◦Confidence = sc{I5,I9}/ sc{I9} = 2/2 = 100%◦R5 is SelectedR6 : I2  I9 [40%,100%]◦Confidence = sc{I2,I9}/ sc{I9} = 2/2 = 100%◦R6 is SelectedR7 : I7  I9 [40%,100%]◦Confidence = sc{I7,I9}/ sc{I9} = 2/2 = 100%◦R7 is SelectedR8 : I1  I8 [40%, 66%]◦Confidence = sc{I1,I8}/ sc{I1} = 2/3 = 66.66%◦R8 is RejectedGenerating Association Rules by ClassificationI2 ^ I5 ^ I7  I9 [40%,100%]I2 ^ I5  I9 [40%,100%]I2 ^ I7  I9 [40%,100%]I5 ^ I7  I9 [40%,100%]I5  I9 [40%,100%]I7  I9 [40%,100%]I2  I9 [40%,100%]We reduce the confidence to 66% to include I8 on R.H.SI1  I8 [40%,66%]List of Selected Rules by ClassificationStudent Grade Income BuysMath Low Low BreadCS Low Low MilkMath Low Low MilkMath Low Low BreadCS Medium High MilkTest Data• First Tuple: Can be written as I2 ^ I5 ^ I7  I9 [Success] The above rule is correctly classified And hence the Math student with low grade and low income buys bread• Second Tuple: Can be written as I1  I8 [Success] The above rule is not correctly classified • Third Tuple: Can be written as I2 ^ I5 ^ I7  I8 [Error] The above rule is not classifiedStudent Grade Income BuysMath Low Low BreadCS Low Low MilkMath Low Low MilkMath High Low BreadCS Medium High BreadTest Data• FourthTuple: Can be written as I2 ^ I7  I9 [Success] The above rule is correctly classified And hence the Math student with low grade and low income buys bread• Fifth Tuple: Can be written as I1  I9 [Success] The above rule is correctly classified Hence we have 80% predictive accuracy.And 20% Error


View Full Document
Download Data mining Extra Credit
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data mining Extra Credit and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data mining Extra Credit 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?