UIC CS 583 - hapter 6 Post-Processing and Rule Interestingness in Data Mining - D2146158

Home> Schools> University of Illinois at Chicago> Computer Science (CS) > CS 583> hapter 6 Post-Processing and Rule Interestingness in Data Mining

UIC CS 583 - hapter 6 Post-Processing and Rule Interestingness in Data Mining

School name University of Illinois at Chicago

Course Cs 583- Data Mining and Text Mining

Pages 29

Download Save

Unformatted text preview:

A Naïve Solution is not SufficientMain Issues in Finding Unexpected RulesGranularity of knowledgePost-analysis vs. incorporating user knowledge in mining algoFinding interesting rules can be done using the following two methods:2 post-analyze the discovered rules using user preferences and knowledge to identify interesting rules.Template-Based ApproachFuzzy knowledge (FK) approachGeneral impressions (GI) approachR3: Job = yes, Own_house = yes  Loan = approvedThe Problems with the Individual Rules RepresentationGeneral rules, Summaries and Exceptions (GSE)GE-1: A2 = x   (sup = 41%, conf = 77%)GSE-1: A2 = x   (sup = 41%, conf = 77%)Chapter 6Post-Processing and Rule Interestingnessin Data MiningUIC - CS 594 B. Liu0Introduction - The goal of data mining is to discover useful or interesting rules for the user.- Most of the research focuses on finding various types of (any) rules,classification rules, association rules, sequential rules, etc. - In many applications it is all too easy to generate a large number of rules,thousands, or more, but most of them are of no interest to the user, i.e.,obvious, redundant or useless. - Thus, it is difficult for the user to analyze them and to identify those that aretruly interesting. - Automated and/or interactive techniques are needed to help the user. - Identifying interesting rules is hard:1. different people are interested in different things. 2. Even the same person, interested in different things at different times.UIC - CS 594 B. Liu1Rule InterestingnessThe interestingness of a rule: measured using two classes of measures, objective measures and subjective measures. - Objective measures involve analyzing the rule’s structure and the underlyingdata. Such measures include accuracy, significance, support, confidence, etc.(e.g., Quinlan 1992; Agrawal and Srikant 1994; etc). - There are many subjective interestingness measures, novelty, relevance,usefulness and timeliness. However, essentially we use two measures, Unexpectedness (Silberschatz and Tuzhilin 1995; Liu and Hsu. 1996): A rule is interesting if it “surprises” the user.Actionability (Piatesky-Shapiro and Matheus 1994): A rule is interesting if the user can do something with it to his/heradvantage. UIC - CS 594 B. Liu2We focus on unexpectedness in this Chapter. UIC - CS 594 B. Liu3A Naïve Solution is not Sufficient One may say that 1. The user specifies the types of rules that he/she wants to see.2. A system then generates or retrieves those matching rules.This, however, is not sufficient for two important reasons:1. The user typically does not know what interests him/her exactly andcompletely. 2. Unexpected (or novel) rules are not within the user’s concept space, and arethus difficult to be specified. The key point (Liu and Hsu 1996): What is unexpected depends on the user’s existing knowledge of thedomain. UIC - CS 594 B. Liu4Main Issues in Finding Unexpected RulesKnowledge acquisition- How to obtain the user’s existing knowledge. The user may know a great deal, but it is hard, if not impossible, for him/her to tell what he/she knows (precisely and completely). To deal with this problem, expert systems research suggests the following:- Allow interactive and incremental discovering or identifying of interestingrules. Through such interactions, the user will be able to provide morepreferences and existing knowledge about the domain, and to find moreinteresting rules. - Actively stimulate the user, or suggest to him/her what he/she might haveforgotten. UIC - CS 594 B. Liu5Granularity of knowledge- User knowledge can be divided into levels (Liu, Hsu and Chen 1997). Some aspects of the knowledge can be quite vague, while other aspects can be quite precise. Precise knowledge (PK): The user believes that a specific rule exists in the data.E.g., in a loan application domain, the user believes that if one’s monthlysalary is over $5,000, one will be granted a loan with a probability of 90%. Fuzzy knowledge (FK): The user is less sure about the details of the rule. E.g., the user may believe that if one’s monthly salary is around $5,000 ormore, one should be granted a loan. He/she may not be sure that it is exactly$5,000, and is also not sure about the probability.General impressions (GI): The user simply has some vague feelings. E.g., the user may feel that having a higher monthly salary increases one’schance of obtaining a loan, but has no idea how much and what is theprobability.UIC - CS 594 B. Liu6- This division of knowledge is important because it determines how a datamining system can use the knowledge, and also whether it can make use ofall possible types of user knowledge to discover interesting rules. Knowledge representation How to represent the user knowledge? - It is common to use the same syntax as the discovered rules because when the user is mining a particular type of rules, his/her existing concepts are typically also of the same type (Liu and Hsu 1996)Post-analysis vs. incorporating user knowledge in mining algoFinding interesting rules can be done using the following two methods:1 incorporate the user knowledge in the mining algorithm to discover only theinteresting rules.Advantage: it focuses the search of the mining algorithm on only theinteresting rules. UIC - CS 594 B. Liu7Disadvantage: it suffers from knowledge acquisition. User interaction withthe system is difficult because it is not efficient for a mining algorithmto execute whenever the user remembers another piece of knowledge. 2 post-analyze the discovered rules using user preferences and knowledge toidentify interesting rules. Advantage: the mining algorithm is only run once to discover all rules. Theuser then interactively analyzes the rules to identify the interestingones. This also helps to solve the knowledge acquisition problem.Disadvantage: it may generate too many rules initially. - In general, the first approach is ideal if the user is absolutely sure what typesof rules are interesting. - However, if the user does not have all the specific rules in mind to look for,the second approach will be more appropriate. - In many applications, an integrated approach is the preferred choice. UIC - CS 594 B. Liu8Template-Based ApproachThe most straightforward method for selecting

View Full Document


School:
Email:
New Password:
Confirm Password:

UIC CS 583 - hapter 6 Post-Processing and Rule Interestingness in Data Mining

Sign up for free to view:

Please select your school