DOC PREVIEW
BU CS 565 - Reducing the collection of itemsets

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Reducing the collection of itemsets: alternative representations and combinatorial problemsToo many frequent itemsets• If {a1, …, a100} is a frequent itemset, then there are1.27*1030 frequent sub-patterns!• There should be some more condensed way to describe the data1210010021001100100Frequent itemsets maybe too many to be helpful• If there are many and large frequent itemsetsenumerating all of them is costly.• We may be interested in finding the boundaryfrequent patterns.• Question: Is there a good definition of such boundary?all itemsempty setFrequent itemsetsNon-frequent itemsetsborderBorders of frequent itemsets• Itemset X is more specific than itemset Y if X superset of Y (notation: Y<X). Also, Y is more general than X (notation: X>Y)• The Border: Let S be a collection of frequent itemsets and Pthe lattice of itemsets. The border Bd(S) of S consists of all itemsets X such that all more general itemsets than X are in Sand no pattern more specific than X is in S. then with allfor and , then with allfor )(SWWXPWPYXYPYPXSBdPositive and negative border• Border• Positive border: Itemsets in the border that are also frequent (belong in S)• Negative border: Itemsets in the border that are not frequent (do not belong in S) then with allfor and , then with allfor )(SWWXPWSYXYPYPXSBd then with allfor )( SYYXPYSXSBd  then with allfor \)( SYXYPYSPXSBd Examples with borders• Consider a set of items from the alphabet: {A,B,C,D,E} and the collection of frequent sets S = {{A},{B},{C},{E},{A,B},{A,C},{A,E},{C,E},{A,C,E}}• The negative border of collection S isBd-(S) = {{D},{B,C},{B,E}}• The positive border of collection S isBd+(S) = {{A,B},{A,C,E}}Descriptive power of the borders• Claim: A collection of frequent sets S can be fully described using only the positive border (Bd+(S)) or only the negative border (Bd-(S)).Maximal patternsFrequent patterns without proper frequent super patternMaximal Frequent ItemsetnullAB AC AD AE BC BD BE CD CE DEA B C D EABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABCD ABCE ABDE ACDE BCDEABCDEBorderInfrequent ItemsetsMaximal ItemsetsAn itemset is maximal frequent if none of its immediate supersets is frequentMaximal patterns• The set of maximal patterns is the same as the positive border• Descriptive power of maximal patterns:– Knowing the set of all maximal patterns allows us to reconstruct the set of all frequent itemsets!!– We can only reconstruct the set not the actual frequenciesClosed patterns• An itemset is closed if none of its immediate supersets has the same support as the itemsetTID Items1 {A,B}2 {B,C,D}3 {A,B,C,D}4 {A,B,D}5 {A,B,C,D}Itemset Support{A} 4{B} 5{C} 3{D} 4{A,B} 4{A,C} 2{A,D} 3{B,C} 3{B,D} 4{C,D} 3Itemset Support{A,B,C} 2{A,B,D} 3{A,C,D} 2{B,C,D} 3{A,B,C,D} 2Maximal vs Closed ItemsetsTID Items1 ABC2 ABCD3 BCE4 ACDE5 DEnullAB AC AD AE BC BD BE CD CE DEA B C D EABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABCD ABCE ABDE ACDE BCDEABCDE124 1231234245 34512 124 24412323243445122244 423 424Transaction IdsNot supported by any transactionsMaximal vs Closed Frequent ItemsetsnullAB AC AD AE BC BD BE CD CE DEA B C D EABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABCD ABCE ABDE ACDE BCDEABCDE124 1231234245 34512 124 24412323243445122244 423 424Minimum support = 2# Closed = 9# Maximal = 4Closed and maximalClosed but not maximalWhy are closed patterns interesting?• s({A,B}) = s(A), i.e., conf({A}{B}) = 1• We can infer that for every itemset X , s(A U {X}) = s({A,B} U X)• No need to count the frequencies of sets X U {A,B} from the database!• If there are lots of rules with confidence 1, then a significant amount of work can be saved– Very useful if there are strong correlations between the items and when the transactions in the database are similarWhy closed patterns are interesting?• Closed patterns and their frequencies alone are sufficient representation for all the frequencies of all frequent patterns• Proof: Assume a frequent itemset X:– X is closed  s(X) is known – X is not closed s(X) = max {s(Y) | Y is closed and X subset of Y}Maximal vs Closed sets• Knowing all maximal patterns (and their frequencies) allows us to reconstruct the set of frequent patterns• Knowing all closed patterns and their frequencies allows us to reconstruct the set of all frequent patterns and their frequenciesFrequentItemsetsClosedFrequentItemsetsMaximalFrequentItemsetsA more algorithmic approach to reducing the collection of frequent itemsetsPrototype problems: Covering problems• Setting: – Universe of N elements U = {U1,…,UN}– A set of n sets S = {s1,…,sn}– Find a collection C of sets in S (C subset of S) such thatUcєCc contains many elements from U• Example:– U: set of documents in a collection– si: set of documents that contain term ti– Find a collection of terms that cover most of the documentsPrototype covering problems• Set cover problem: Find a small collection C of sets from Ssuch that all elements in the universe U are covered by some set in C• Best collection problem: find a collection C of k sets from Ssuch that the collection covers as many elements from the universe U as possible• Both problems are NP-hard• Simple approximation algorithms with provable properties are available and very useful in practiceSet-cover problem• Universe of N elements U = {U1,…,UN}• A set of n sets S = {s1,…,sn} such that Uisi=U• Question: Find the smallest number of sets from S to form collection C (C subset of S) such thatUcєCc=U• The set-cover problem is NP-hard (what does this mean?)Trivial algorithm• Try all subcollections of S• Select the smallest one that covers all the elements in U• The running time of the trivial algorithm is O(2|S||U|)• This is way too slowGreedy algorithm for set cover• Select first the largest-cardinality set s from S• Remove the elements from s from U• Recompute the sizes of the remaining sets in S• Go back to the first stepAs an algorithm• X = U• C = {}• while X is not empty do– For all sєS let as=|s intersection X|– Let s be such that asis maximal– C = C U {s}– X = X\ sHow can this go wrong?• No global consideration of how good or bad a selected set is going to beHow good is the greedy algorithm?• Consider a minimization problem– In our case we want to minimize the cardinality of set C• Consider an instance I, and cost a*(I) of the optimal solution– a*(I): is the minimum


View Full Document

BU CS 565 - Reducing the collection of itemsets

Documents in this Course
Load more
Download Reducing the collection of itemsets
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Reducing the collection of itemsets and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Reducing the collection of itemsets 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?