DOC PREVIEW
Purdue CS 59000 - Association Rules, Sequential Associations

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 590M Fall 2001: Security Issues in Data MiningWhy Association Rules?Formal DefinitionSample: Market BasketTypes of associationsHistorical Association Rule LearningDatabase community contribution: Market Basket Association RulesDatabase community contribution: Market Basket Association RulesA-Priori AlgorithmSlide 10Slide 11Frequent episodes for sequential associationsFrequent Episodes: DefinitionApplications/Issues in SecurityCS 590M Fall 2001: Security Issues in Data MiningLecture 5: Association Rules,Sequential AssociationsWhy Association Rules?•Understand attributes, not entities•Discover relationships that–Show some dependency between attributes–Are “interesting”•Give an understanding of the data spaceFormal Definition•Data:–Items={i1,…,in}–Transactions T={t1,…,tm} where ti = {ij1, …, ijk}•Support: Given AI,supp(A) = |{t  T | t  A}| / |T|•Goal: Find rules AB with support ≥ s and confidence ≥ c where:–A, B  I, A  B = –s = supp(A  B), c = supp(A  B) / supp(A)Sample: Market Basket ITHardware Auto Clothing FurnishingsPaper goodsGroceryt0- - + - + -t1+ + - - - -t2+ - - - + -t3- - + + + -t4+ + - - + -t5- - - - + +t6- - + + - -t7+ + - - + -t8- - + - + +t9- - - - + -Types of associations•Machine-learning base: classification / decision rules–Entities independent, unordered–Find rules leading to target class–To get rule sets, re-run for all classes as targets•Market-basket–Collection of related entities with same key–Can be modeled as independent entities, sparse data•Sequential–Like market basket, but group by distance rather than same keyHistorical Association Rule Learning•Decision tree converted to rules–ID3, as discussed in previous lecture•Direct production of decision rules–CN2, others•Problem: Algorithms don’t scale well to many practical problemsDatabase community contribution: Market Basket Association Rules •Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207--216, Washington, D.C., May 1993.•Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499Database community contribution: Market Basket Association Rules•Practical problems often have sparse data–Many attributes, few items per transaction•Goal is typically search for high support–High support = broad impact–High confidence not crucial (as opposed to classification)•Very Large data sets(main-memory algorithms impractical)A-Priori Algorithm•Observation: if A has support s, theni  A, supp(i) ≥ s•Gives bottom-up algorithm–Find single items with support ≥ s–Just look at transaction subsets with those items for pairs–RecurseA-Priori Algorithm•First, generate all large itemsets–Sets X  I such that supp(X) ≥ s (threshold)–Captures “supp(A  B) ≥ s” part of problem•Second, find high-confidence rules that are subsets of X–B = Xi , A = X-B–To find confidence, need supp(A)But A will be in all large itemsets – don’t need to go back to the database!A-Priori AlgorithmL1 = {large 1-itemsets};for ( k = 2; Lk-1  ; k++ ) Ck = select p.i1, p.iY, …, p.ik-1, q.ik-1from Lk-1 p, Lk-1 qwhere p.i1 = q.i1, …, p.ik-2 = q.ik-2 transactions t  TCt = subset(Ck, t); // Candidates contained in t candidates c  Ct: c.count++; Lk = {c  Ck | c.count  minsup}Answer = k Lk;Frequent episodes for sequential associations•Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo: Discovering Frequent Episodes in Sequences. In First International Conference on Knowledge Discovery and Data Mining (KDD'95), 210 - 215, Montreal, Canada, August 1995. AAAI Press. •Instead of transaction, items grouped by sliding window in time•Same basic idea as A-PrioriFrequent Episodes: Definition•Event types E•Event (A,t) where A in E•Sequence S=((A1,t1),…,(An,tn))•Frequent episode F = (Ai, …, Aj) where tl, tm such that t1tl<…<tm  tntm-tl  window:–count( ((Ai,tl), …, (Aj, tm)) )  supportApplications/Issues in Security•Frequent episodes in intrusion detection data–What does this tell us?•Preventing the discovery of associations–Known items to protect–What if we don’t know what we want to


View Full Document

Purdue CS 59000 - Association Rules, Sequential Associations

Documents in this Course
Lecture 4

Lecture 4

42 pages

Lecture 6

Lecture 6

38 pages

Load more
Download Association Rules, Sequential Associations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Association Rules, Sequential Associations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Association Rules, Sequential Associations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?