SBU CSE 634 - Data mining Extra Credit - D941785

Home> Schools> Stony Brook University> Computer Science (CSE) > CSE 634> Data mining Extra Credit

SBU CSE 634 - Data mining Extra Credit

Course Cse 634- Data Mining Concepts And Techn

Pages 13

Download Save

Unformatted text preview:

Slide 1Original DataConverted DataGenerating 1-itemset Frequent PatternGenerating 2-itemset Frequent PatternJoining and PruningGenerating 3-itemset Frequent PatternGenerating Association Rules by ClassificationGenerating Association Rules by ClassificationGenerating Association Rules by ClassificationList of Selected Rules by ClassificationTest DataTest DataCSE 634/590 Data miningExtra Credit:Submitted By:Moieed Ahmed 106867769Student Grade Income BuysCS High Low MilkCS High High BreadMath Low Low BreadCS Medium High MilkMath Low Low BreadOriginal DataStudent= CS(I1)Student=math (I2)Grade= high (I3)Grade=medium(I4)Grade=low(I5)Income=high(I6)Income=low(I7)Buys=milk(I8)Buys=bread(I9)+ - + - - - + + -+ - + - - + - - +- + - - + - + - ++ - - + - + - + -- + - - + - + - +Converted DataGenerating 1-itemset Frequent PatternItem SetSupport Count{I1} 3{I2} 2{I3} 2{I4} 1{I5} 2{I6,} 2{I7} 3{I8} 2{I9} 3Item SetSupport Count{I1} 3{I2} 2{I3} 2{I5} 2{I6,} 2{I7} 3{I8} 2{I9} 3Scan D for count of each candidateCompare candidate support count with minimum support countC1L1Let, the minimum support count be 2. Since we have 5 records => (Support) = 2/5 = 40%Let, minimum confidence required is 70%.Item Set{I1,I2}{I1,I3}{I1,I4}{I1,I5}{I1,I6}{I1,I7}{I1,I8}{I1,I9}{I2,I3}{I2,I4}{I2,I5}{I2,I6}{I2,I7}{I2,I8}{I2,I9}{I3,I4}{I3,I5}{I3,I6}{I3,I7}{I3,I8}{I3,I9}{I4,I5}{I4,I6}{I4,I7}{I4,I8}{I4,I9}{I5,I6}{I5,I7}{I5,I8}{I5,I9}{I6,I7}{I6,I8}{I6,I9}{I7,I8}{I7,I9}{I8,I9}Generating 2-itemset Frequent Pattern Item Set Support Count{I1,I2} 0{I1,I3} 2{I1,I4} 1{I1,I5} 0{I1,I6} 2{I1,I7} 1{I1,I8} 2{I1,I9} 1{I2,I3} 0{I2,I4} 0{I2,I5} 2{I2,I6} 0{I2,I7} 2{I2,I8} 0{I2,I9} 2{I3,I4} 0{I3,I5} 0{I3,I6} 1{I3,I7} 1{I3,I8} 1{I3,I9} 1{I4,I5} 0{I4,I6} 1{I4,I7} 0{I4,I8} 1{I4,I9} 0{I5,I6} 0{I5,I7} 2{I5,I8} 0{I5,I9} 2{I6,I7} 0{I6,I8} 1{I6,I9} 0{I7,I8} 1{I7,I9} 2{I8,I9} 0Item Set Support Count{I1,I3} 2{I1,I6} 2{I1,I8} 2{I2,I5} 2{I2,I7} 2{I2,I9} 2{I5,I7} 2{I5,I9} 2{I7,I9} 2C2 C2L2Generate C2 candidates from L1Scan D for count of each candidate Compare candidate support count with minimum support count1. The join step: To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 with itself. This set of candidates is denoted Ck. Lk – Itemsets Ck – CandidatesConsidering {I2,I5} , {I7,I9} from L2 to arrive at L3 we Join L2xL2And thus we have {I2,I5,I7} , {I2,I5,I9} in the resultant candidates generated from L2Considering {I1,I3} , {I1,I6} from L2 we generate candidates {I1,I3,I6}2. The prune step:Ck is a superset of Lk, that is, its members may or may not be frequent All candidates having a count no less than the minimum support count are frequent by definition, and therefore belong to Lk). Ck, however, can be hugeThus, {I2,I5,I7}, {I2,I5,I9} from join step are considered since they have minimum support but {I1,I3,I6} is discarded since it does not meet the support count needed.Joining and PruningItem Set{I2,I5,I7}{I2,I5,I9}{I2,I7,I9}{I5,I7,I9}Generating 3-itemset Frequent PatternGenerate C3 candidates from L2Scan D for count of each candidate Item SetSupport Count{I2,I5,I7}2{I2,I5,I9}2{I2,I7,I9}2{I5,I7,I9}2Compare candidate support count with minimum support countItem SetSupport Count{I2,I5,I7}2{I2,I5,I9}2{I2,I7,I9}2{I5,I7,I9}2C3L3Generating 4-itemset Frequent PatternGenerate C4 candidates from L3Item Set{I2,I5,I7,I9}C3C4Scan D for count of each candidate Item SetSupport Count{I2,I5,I7,I9}2C4Compare candidate support count with minimum support countItem SetSupport Count{I2,I5,I7,I9}2L4oWhen mining association rules for use in classification, we are only interested in association rules of the formo(p1 ^ p2 ^: : : pl )  Aclass = C where the rule antecedent is a conjunction of items, p1, p2, : : : , pl (l n), associated with a class label, C.In our example Aclass would be either ( I8 or I9 on RHS) that is to predict whether a student with given characteristics buys Milk / Bread.Let, minimum confidence required be 70%Considering, l={I2,I5,I7,I9}It’s nonempty subsets are {{2},{5},{7},{9},{2,5},{2,7},{2,9},{5,7}, {5,9},{7,9},{2,5,7},{2,5,9},{2,7,9},{5,7,9}}Generating Association Rules by ClassificationR1 : I2 ^ I5 ^ I7  I9 [40%,100%]◦Confidence = sc{I2,I5,I7,I9}/ sc{I2,I5,I7} = 2/2 = 100%◦R1 is SelectedConsidering 3 itemset Frequent PatternR2 : I5 ^ I7  I9 [40%,100%]◦Confidence = sc{I5,I7,I9}/ sc{I5,I7} = 2/2 = 100%◦R2 is SelectedR3 : I2 ^ I7  I9 [40%,100%] ◦Confidence = sc{I2,I7,I9}/ sc{I2,I7} = 2/2 = 100%◦R3 is SelectedR4 : I2 ^ I5  I9 [40%,100%] ◦Confidence = sc{I2,I7,I9}/ sc{I2,I7} = 2/2 = 100%◦R4 is SelectedGenerating Association Rules by ClassificationConsidering 2 itemset Frequent PatternR5 : I5  I9 [40%,100%]◦Confidence = sc{I5,I9}/ sc{I9} = 2/2 = 100%◦R5 is SelectedR6 : I2  I9 [40%,100%]◦Confidence = sc{I2,I9}/ sc{I9} = 2/2 = 100%◦R6 is SelectedR7 : I7  I9 [40%,100%]◦Confidence = sc{I7,I9}/ sc{I9} = 2/2 = 100%◦R7 is SelectedR8 : I1  I8 [40%, 66%]◦Confidence = sc{I1,I8}/ sc{I1} = 2/3 = 66.66%◦R8 is RejectedGenerating Association Rules by ClassificationI2 ^ I5 ^ I7  I9 [40%,100%]I2 ^ I5  I9 [40%,100%]I2 ^ I7  I9 [40%,100%]I5 ^ I7  I9 [40%,100%]I5  I9 [40%,100%]I7  I9 [40%,100%]I2  I9 [40%,100%]We reduce the confidence to 66% to include I8 on R.H.SI1  I8 [40%,66%]List of Selected Rules by ClassificationStudent Grade Income BuysMath Low Low BreadCS Low Low MilkMath Low Low MilkMath Low Low BreadCS Medium High MilkTest Data• First Tuple: Can be written as I2 ^ I5 ^ I7  I9 [Success] The above rule is correctly classified And hence the Math student with low grade and low income buys bread• Second Tuple: Can be written as I1  I8 [Success] The above rule is not correctly classified • Third Tuple: Can be written as I2 ^ I5 ^ I7  I8 [Error] The above rule is not classifiedStudent Grade Income BuysMath Low Low BreadCS Low Low MilkMath Low Low MilkMath High Low BreadCS Medium High BreadTest Data• FourthTuple: Can be written as I2 ^ I7  I9 [Success] The above rule is correctly classified And hence the Math student with low grade and low income buys bread• Fifth Tuple: Can be written as I1  I9 [Success] The above rule is correctly classified Hence we have 80% predictive accuracy.And 20% Error

View Full Document


School:
Email:
New Password:
Confirm Password:

SBU CSE 634 - Data mining Extra Credit

Sign up for free to view:

Please select your school