SBU CSE 634 - Lecture Notes - D607200

Home> Schools> Stony Brook University> Computer Science (CSE) > CSE 634> Lecture Notes

SBU CSE 634 - Lecture Notes

Course Cse 634- Data Mining Concepts And Techn

Pages 13

Download Save

Unformatted text preview:

Slide Number 1Slide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13CSE 634/590 Data miningSubmitted By:Moieed AhmedStudent Grade Income BuysCS High Low MilkCS High High BreadMath Low Low BreadCS Medium High MilkMath Low Low BreadStudent= CS(I1)Student=math (I2)Grade= high(I3)Grade=medium(I4)Grade=low(I5)Income=high(I6)Income=low(I7)Buys=mil k(I8)Buys=bread(I9)+ - + - - - + + -+ - + - - + - - +- + - - + - + - ++ - - + - + - + -- + - - + - + - +Item SetSupport Count{I1}3{I2}2{I3}2{I4}1{I5} 2{I6,} 2{I7} 3{I8} 2{I9} 3Item SetSupport Count{I1}3{I2}2{I3}2{I5} 2{I6,} 2{I7} 3{I8} 2{I9} 3Scan D for count of each candidateCompare candidate support count with minimum support countC1L1Let, the minimum support count be 2. Since we have 5 records => (Support) = 2/5 = 40%Let, minimum confidence required is 70%.Item Set{I1,I2}{I1,I3}{I1,I4}{I1,I5}{I1,I6}{I1,I7}{I1,I8}{I1,I9}{I2,I3}{I2,I4}{I2,I5}{I2,I6}{I2,I7}{I2,I8}{I2,I9}{I3,I4}{I3,I5}{I3,I6}{I3,I7}{I3,I8}{I3,I9}{I4,I5}{I4,I6}{I4,I7}{I4,I8}{I4,I9}{I5,I6}{I5,I7}{I5,I8}{I5,I9}{I6,I7}{I6,I8}{I6,I9}{I7,I8}{I7,I9}{I8,I9}Item SetSupport Count{I1,I2} 0{I1,I3}2{I1,I4}1{I1,I5}0{I1,I6}2{I1,I7}1{I1,I8}2{I1,I9}1{I2,I3}0{I2,I4}0{I2,I5}2{I2,I6}0{I2,I7}2{I2,I8}0{I2,I9}2{I3,I4}0{I3,I5}0{I3,I6}1{I3,I7}1{I3,I8}1{I3,I9}1{I4,I5}0{I4,I6}1{I4,I7}0{I4,I8}1{I4,I9}0{I5,I6}0{I5,I7}2{I5,I8} 0{I5,I9} 2{I6,I7} 0{I6,I8} 1{I6,I9} 0{I7,I8} 1{I7,I9} 2{I8,I9} 0Item Set Support Count{I1,I3}2{I1,I6}2{I1,I8}2{I2,I5}2{I2,I7}2{I2,I9}2{I5,I7}2{I5,I9} 2{I7,I9} 2C2 C2L2Generate C2 candidates from L1Scan D for count of each candidate Compare candidate support count with minimum support count1. The join step: To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 with itself. This set of candidates is denoted Ck. Lk –ItemsetsCk – CandidatesConsidering {I2,I5} , {I7,I9} from L2 to arrive at L3 we Join L2xL2And thus we have {I2,I5,I7} , {I2,I5,I9} in the resultant candidates generated from L2Considering {I1,I3} , {I1,I6} from L2 we generate candidates {I1,I3,I6}2. The prune step:Ck is a superset of Lk, that is, its members may or may not be frequentAll candidates having a count no less than the minimum support count are frequent by definition, and therefore belong to Lk). Ck, however, can be hugeThus, {I2,I5,I7}, {I2,I5,I9} from join step are considered since they have minimum support but {I1,I3,I6} is discarded since it does not meet the support count needed.Item Set{I2,I5,I7}{I2,I5,I9}{I2,I7,I9}{I5,I7,I9}Generate C3 candidates from L2Scan D for count of each candidate Item SetSupport Count{I2,I5,I7}2{I2,I5,I9}2{I2,I7,I9}2{I5,I7,I9}2Compare candidate support count with minimum support countItem SetSupport Count{I2,I5,I7}2{I2,I5,I9}2{I2,I7,I9}2{I5,I7,I9}2C3L3Generating 4-itemset Frequent PatternGenerate C4 candidates from L3Item Set{I2,I5,I7,I9}C3C4Scan D for count of each candidate Item SetSupport Count{I2,I5,I7,I9}2C4Compare candidate support count with minimum support countItem SetSupport Count{I2,I5,I7,I9}2L4o When mining association rules for use in classification, we are only interested in association rules of the formo (p1 ^ p2 ^: : : pl ) Æ Aclass = C where the rule antecedent is a conjunction of items, p1, p2, : : : , pl (l n), associated with a class label, C.` In our example Aclass would be either ( I8 or I9 on RHS) that is to predict whether a student with given characteristics buys Milk / Bread.` Let, minimum confidence required be 70%` Considering, l={I2,I5,I7,I9}` It’s nonempty subsets are {{2},{5},{7},{9},{2,5},{2,7},{2,9},{5,7},{5,9},{7,9},{2,5,7},{2,5,9},{2,7,9},{5,7,9}}` R1 : I2 ^ I5 ^ I7 Æ I9 [40%,100%]◦ Confidence = sc{I2,I5,I7,I9}/ sc{I2,I5,I7} = 2/2 = 100%◦ R1 is Selected` Considering 3 itemset Frequent Pattern` R2 : I5 ^ I7 Æ I9 [40%,100%]◦ Confidence = sc{I5,I7,I9}/ sc{I5,I7} = 2/2 = 100%◦ R2 is Selected` R3 : I2 ^ I7 Æ I9 [40%,100%]◦ Confidence = sc{I2,I7,I9}/ sc{I2,I7} = 2/2 = 100%◦ R3 is Selected` R4 : I2 ^ I5 Æ I9 [40%,100%]◦ Confidence = sc{I2,I7,I9}/ sc{I2,I7} = 2/2 = 100%◦ R4 is SelectedConsidering 2 itemset Frequent Pattern` R5 : I5 Æ I9 [40%,100%]◦ Confidence = sc{I5,I9}/ sc{I9} = 2/2 = 100%◦ R5 is Selected` R6 : I2 Æ I9 [40%,100%]◦ Confidence = sc{I2,I9}/ sc{I9} = 2/2 = 100%◦ R6 is Selected` R7 : I7 Æ I9 [40%,100%]◦ Confidence = sc{I7,I9}/ sc{I9} = 2/2 = 100%◦ R7 is Selected` R8 : I1 Æ I8 [40%, 66%]◦ Confidence = sc{I1,I8}/ sc{I1} = 2/3 = 66.66%◦ R8 is Rejected` I2 ^ I5 ^ I7 Æ I9 [40%,100%]` I2 ^ I5 Æ I9 [40%,100%]` I2 ^ I7 Æ I9 [40%,100%]` I5 ^ I7 Æ I9 [40%,100%]` I5 Æ I9 [40%,100%]` I7 Æ I9 [40%,100%]` I2 Æ I9 [40%,100%]` We reduce the confidence to 66% to include I8 on R.H.S` I1 Æ I8 [40%,66%]Student Grade Income BuysMath Low Low BreadCS Low Low MilkMath Low Low MilkMath Low Low BreadCS Medium High Milk• First Tuple: Can be written as I2 ^ I5 ^ I7 Æ I9 [Success]The above rule is correctly classifiedAnd hence the Math student with low grade and low income buys bread• Second Tuple: Can be written as I1 Æ I8 [Success]The above rule is not correctly classified• Third Tuple: Can be written as I2 ^ I5 ^ I7 Æ I8 [Error]The above rule is not classifiedStudent Grade Income BuysMath Low Low BreadCS Low Low MilkMath Low Low MilkMath High Low BreadCS Medium High Bread• FourthTuple: Can be written as I2 ^ I7 Æ I9 [Success]The above rule is correctly classifiedAnd hence the Math student with low grade and low income buys bread• Fifth Tuple: Can be written as I1 Æ I9 [Success]The above rule is correctly classifiedHence we have 80% predictive accuracy.And 20% Error

View Full Document


School:
Email:
New Password:
Confirm Password:

SBU CSE 634 - Lecture Notes

Sign up for free to view:

Please select your school