SBU CSE 634 - Association Analysis (chapter 6) (83 pages)

Previewing pages 1, 2, 3, 4, 5, 6, 38, 39, 40, 41, 42, 78, 79, 80, 81, 82, 83 of 83 page document View the full content.
View Full Document

Association Analysis (chapter 6)



Previewing pages 1, 2, 3, 4, 5, 6, 38, 39, 40, 41, 42, 78, 79, 80, 81, 82, 83 of actual document.

View the full content.
View Full Document
Unformatted text preview:

Association Analysis chapter 6 Professor Anita Wasilewska Lecture Notes Association Rules Mining An Introduction This is an intuitive more or less introduction It contains explanation of the main ideas Frequent item sets association rules how we construct the association rules how we judge the goodness of the rules Example of an intuitive run of the Appriori Algorithm and association rules generation Discussion of the relationship between the Association and Correlation analysis What Is Association Mining Association rule mining Finding frequent patterns called associations among sets of items or objects in transaction databases relational databases and other information repositories Applications Basket data analysis cross marketing catalog design loss leader analysis clustering classification etc Association Rules Rule general form Body ead support confidence Rule Predicate form buys x diapers buys x beer 0 5 60 major x CS takes x DB grade x A 1 75 Rule Attribute form Diapers beer 1 75 Association Analysis Basic Concepts Given a database of transactions where each transaction is a list of items Find all rules that associate the presence of one set of items with that of another set of items Example 98 of people who purchase tires and auto accessories also get automotive services done Association Model I i1 i2 in a set of items J P I set of all subsets of the set of items elements of J are called itemsets Transaction T T is subset of set I of items Data Base set of transactions An association rule is an implication of the form X Y where X Y are disjoint subsets of items I elements of J Problem Find rules that have support and confidence greater that user specified minimum support and minimun confidence How we find the rules 1 Apriori Algorithm Apriori Algorithm First Step we find all frequent item sets An item set is frequent if it has a support greater or equal a fixed minimum support We fix minimum support usually low Rules generation from the frequent itemsets is a separate problem and our book doesn t really talk much about it How we find the rules 2 Apriori Algorithm In order calculate efficiently frequent item sets 1 item sets one element item sets 2 item sets two elements item sets 3 item sets three elements item sets etc we use a principle called an Apriori Principle hence the name Apriori Algorithm ANY SUBSET OF A FREQUENT ITEMSET IS A FREQUENT ITEMSET How we find the rules 3 Apriori Process Appriori Algorithm stops after the First Step Second Step in the Appriori Proces item sets generarion AND rules generation is the rule generation We calculate from the frequent item sets a set of the strong rules Strong rules rules with at least minimum support low and minimum confidence high Apriori Process is then finished How we find the rules 4 Apriori Process The Aprriori Process problem is how do we form the association rules A B from the frequent item sets Remark A B are disjoint subsets of the set I of items in general and of the set 2frequent 3 frequent item sets etc as generated by the Apriori Algorithm How we find the rules 5 1 frequent item set i1 no rule 2 frequent item set i1 i2 there are two rules i1 i2 and i2 i2 We write them also as i1 i2 and i2 i2 We decide which rule we accept by calculating its support greater minimum support and confidence greater minimum confidence How we find the rules 6 3 frequent item set i1 i2 i3 The rules by definition are of the form A B where A and B are disjoint subsets of i1 i2 i3 i e we have to find all subsets A B of i1 i2 i3 such that A B i1 i2 i3 and A B For example let A i1 i2 and B i3 The rule is i1 i2 i3 and we write it in a form i1 i2 i3 or milk bread vodka if item i1 is milk item i2 is bread and item i3 is vodka How we find the rules 7 Another choice for A and B is for example A i1 and B 12 i3 The rule is i1 i2 i3 and we write it in a form i1 i2 i3 or milk bread vodka if item i1 is milk item i2 is bread and item i3 is vodka REMEMBER We have to cover all the choices for A and B Which rule we accept is being decided by calculating its support greater minimum support and confidence greater minimum confidence Confidence and Support 1 Confidence the rule X Y holds in the database D with confidence c if the c of the transactions in D that contain X also contain Y Support The rule X Y has support s in D if s of the transaction contain XUY Support and Confidence 2 Customer buys both Customer buys diaper Customer buys beer Find all the rules X Y Z with minimum confidence and support Support s probability that a transaction contains X Y Z confidence c conditional probability that a transaction containing X Y also contains Z Support definition Support of a rule A B in the database D of transactions is given by formula where sc support count Support A B P A U B sc A U B D Frequent Item sets sets of items with a support support MINIMAL support We user fix MIN support usually low and Min Confidence high Confidence definition Confidence of a rule A B in the database D of transactions is given by formula where sc support count Conf A B P B A sc AUB D divided by scA D P A U B P A sc AUB scA Example Let consider a data base D T1 T2 T9 where T1 1 2 5 we write k for item ik T2 2 4 T3 2 3 T4 1 2 4 T5 1 3 T6 2 3 T7 1 3 T8 1 2 3 5 T9 1 2 3 To find association rules we follow the following steps STEP 1 Count occurrences of items in D STEP2 Fix Minimum support usually low STEP 3 Calculate frequent 1 item sets STEP 4 Calculate frequent 2 item sets STEP 5 Calculate frequent 3 item sets STOP when there is no more frequent item sets This is the end of Apriori Algorithm phase Example c d STEP 6 Fix the minimum confidence usually high STEP 7 Generate strong rules support min support and confidence min confidence END of rules generation phase Lets now calculate all steps of our process for the data base D We represent our transactional data base as relational data base a table and put the occurrences of items as an extra row on the bottom STEP 1 Example c d STEP 1 items occurrences sc in a table its T1 T2 T3 T4 T5 T6 T7 T8 T9 sc 1 0 0 0 6 2 0 0 7 3 0 0 0 6 4 0 0 0 0 0 0 0 2 5 0 0 0 0 0 0 0 2 Example c d …


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Association Analysis (chapter 6) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Association Analysis (chapter 6) and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?