SBU CSE 634 - Classification by Association rules - Example Problem

Unformatted text preview:

Original DataConverted Data Generating 1-itemset Frequent PatternGenerating 2-itemset Frequent Pattern Join Operation Generating 3-itemset Frequent Pattern Generating 4-itemset Frequent PatternGenerating Association Rules from Frequent ItemsetsClassification RulesTest Data SetThank YouCSE 634/590 Data mining Extra Credit: Classification by Association rules: Example ProblemMuhammad Asiful Islam, SBID: 106506983Original DataOutlook Humidity Wind PlayTenisSunny High Weak NoSunny High Strong NoOvercast Normal Weak YesRain High Weak YesRain Normal Weak YesRain High Strong NoOvercast Normal Strong YesSunny High Weak NoRain Normal Weak YesSunny Normal Strong YesOvercast High Strong YesOvercast Normal Weak YesRain High Strong NoConverted DataOutlook=Sunny (I1)Outlook=Overcast (I2)Outlook= Rain(I3)Humidity =High(I4)Humidity =Normal(I5)Wind=Weak(I6)Wind=Strong(I7)PlayTenis =Yes(I8)PlayTenis =No(I9)+ - - + - + - - ++ - - + - - + - +- + - - + + - + -- - + + - + - + -- - + - + + - + -- - + + - - + - +- + - - + - + + -+ - - + - + - - +- - + - + + - + -+ - - - + - + + -- + - + - - + + -- + - - + + - + -- - + + - - + - +Generating 1-itemset Frequent PatternItem SetSupport Count{I1}4{I2}4{I3}5{I4}7{I5} 6{I6} 7{I7} 6{I8} 8{I9} 5Item SetSupport Count{I1}4{I2}4{I3}5{I4}7{I5} 6{I6,} 7{I7} 6{I8} 8{I9} 5Scan D for count of each candidateCompare candidate support count with minimum support countC1L1Let, the minimum support count be 4. So, min_sup = 4/13 = 30.76%Let, minimum confidence required is 70%.Generating 2-itemset Frequent Pattern Item Set{I1,I2}{I1,I3}{I1,I4}{I1,I5}{I1,I6}{I1,I7}{I1,I8}{I1,I9}{I2,I3}{I2,I4}{I2,I5}{I2,I6}{I2,I7}{I2,I8}{I2,I9}{I3,I4}{I3,I5}{I3,I6}{I3,I7}{I3,I8}{I3,I9}{I4,I5}{I4,I6}{I4,I7}{I4,I8}{I4,I9}{I5,I6}{I5,I7}{I5,I8}{I5,I9}{I6,I7}{I6,I8}{I6,I9}{I7,I8}{I7,I9}{I8,I9}Item SetSupport Count{I1,I2} 0{I1,I3}0{I1,I4}3{I1,I5}1{I1,I6}2{I1,I7}2{I1,I8}1{I1,I9}3{I2,I3}0{I2,I4}1{I2,I5}3{I2,I6}2{I2,I7}2{I2,I8}4{I2,I9}0{I3,I4}3{I3,I5}2{I3,I6}3{I3,I7}2{I3,I8}3{I3,I9}2{I4,I5}0{I4,I6}3{I4,I7}4{I4,I8}2{I4,I9}5{I5,I6}4{I5,I7}2{I5,I8} 6{I5,I9} 0{I6,I7} 0{I6,I8} 5{I6,I9} 2{I7,I8} 3{I7,I9} 3{I8,I9} 0Item Set Support Count{I2,I8}4{I4,I7}4{I4,I9}5{I5,I6}4{I5,I8} 6{I6,I8} 5C2 C2L2Generate C2 candidates from L1Scan D for count of each candidate Compare candidate support count with minimum support countJoin OperationL2 = {{I2, I8}, {I4, I7}, {I4, I9}, {I5, I6}, {I5, I8}, {I6, I8}}.-To find C3, we compute L2 join L2-Similar to natural join operation in Data BaseJoin Operation illustrated: -Consider an itemset {I2,I8} from L2, -Now search for other itemsets containing I2 or I8 in L2;-this gives us itemsets{I5,I8} and {I6,I8}join =So, it’s like joining two database tuples with common column value I8So, {I2,I8} joining with {I5,I8} results {I2,I5,I8}and, {I2,I8} joining with {I6,I8} results {I2,I6,I8}Similarly, using the same technique for every itemset in L2 we get;-‐C3= L2 Join L2 = {{I2, I5, I8}, {I2, I6, I8}, {I4, I7, I9}, {I5, I6, I8}}.{I2,I8} {I5,I8} {I2,I5,I8}Generating 3-itemset Frequent PatternItem Set{I5,I6,I8}Generate C3 candidates from L2Scan D for count of each candidate Item SetSupport Count{I5,I6,I8}4Compare candidate support count with minimum support countItem SetSupport Count{I5,I6,I8}4C3 C3 L3-Use Apriori Property -- all subsets of a frequent itemset must also be frequent ‐C3= L2 Join L2 = {{I2, I5, I8}, {I2, I6, I8}, {I4, I7, I9}, {I5, I6, I8}}.-For example , lets take {I5, I6, I8}.The 2-item subsets of it are {I5, I6}, {I5, I8} & {I6, I8}. Since all 2- item subsets of {I5, I6, I8} are members of L2, We will keep {I5, I6, I8} in C3.‐Lets take another example of {I2, I5, I8}which shows how the pruning is performed. The 2-item subsets are {I2, I5}, {I2, I8} & {I5,I8}. -BUT, {I2, I5} is not a member of L2and hence it is not frequent, violating Apriori Property. Thus We will have to remove {I2, I5, I8} from C3.Therefore, C3= {{I5, I6, I8}} after checking for all members of result of Join operation for Pruning.Generating 4-itemset Frequent Pattern• The algorithm uses L3 Join L3 to generate a candidate set of 4-itemsets, C4. • Thus, C4= φ, and algorithm terminates, having found all of the frequent items. This completes our Apriori Algorithm.• Next ? – These frequent itemsets will be used to generate strong association rules (where strong association rules satisfy both minimum support & minimum confidence).• We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I6}, {I7}, {I8}, {I9} {I2,I8}, {I4,I7}, {I4,I9}, {I5,I6}, {I5,I8}, {I6,I8}, {I5,I6,I8}}.• Now, to generate classification rules, we need to consider only the frequent item sets which contains I8 or I9.Generating Association Rules from Frequent ItemsetsConsider only the frequent item sets which contains I8 or I9: {I2,I8}, {I4,I9}, {I5,I8}, {I6,I8}, {I5,I6,I8}• Consider {I2,I8}-R1: I2 Æ I8– Confidence = sc{I2,I8}/sc{I2} = 4/4 = 100% selected.• Consider {I4,I9}-R2: I4 Æ I9– Confidence = sc{I4,I9}/sc{I4} = 5/7 = 71.42% selected. • Consider {I5,I8}-R3: I5 Æ I8– Confidence = sc{I5,I8}/sc{I5} = 6/6 = 100% selected. • Consider {I6,I8}-R4: I6 Æ I8– Confidence = sc{I6,I8}/sc{I6} = 5/7 = 71.42% selected.• Consider{I5,I6,I8}- R5: I5 ^ I6 Æ I8– Confidence = sc{I5,I6,I8}/sc{I5,I6} = 4/4 = 100% selected.Classification RulesRule (AÆB) [support, confidence]• R1: I2 Æ I8 [30.76%, 100%]• R2: I4 Æ I9 [38.46%, 71.42%]• R3: I5 Æ I8 [46.15%, 100%]• R4: I6 Æ I8 [38.46%, 71.42%]• R6: I5 ^ I6 Æ I8 [30.76%, 100%]Alternatively:-1. Outlook = overcast Æ PlayTenis = yes [30.76%, 100%]2. Humidity = high Æ PlayTenis = no [38.46%, 71.42%]3. Humidity = normal Æ PlayTenis = yes [46.15%, 100%]4. Wind = weak Æ PlayTenis = yes [38.46%, 71.42%]5. Humidity = normal AND Wind = weak Æ PlayTenis = yes [30.76%, 100%]Test Data SetOutlook Humidity Wind PlayTenisOvercast Normal Strong YesRain Normal Weak YesRain Normal Strong NoSunny Normal Weak YesSunny High Strong NoGiven this test data, lets classify each of them and determine the accuracy:-1: Outlook = overcast, And using rule 1, we get PlayTenis = Yes, So, this example is correctly classified. Note that rule 3 also applies.2: Humidity = normal ^ Wind = weak, And using rule 5, we get PlayTenis = Yes, So, this example is correctly classified.3: Humidity = normal, And using rule 3, we get PlayTenis = Yes. But actual class is No. So, this example is incorrectly classified.4:


View Full Document

SBU CSE 634 - Classification by Association rules - Example Problem

Download Classification by Association rules - Example Problem
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Classification by Association rules - Example Problem and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Classification by Association rules - Example Problem 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?