DOC PREVIEW
USC CSCI 599 - DM3_computer99

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

0018-9162/99/$10.00 © 1999 IEEE2 ComputerAlthough there have been many data-miningmethodologies and systems developed inrecent years, we contend that by and large,present mining models lack human involve-ment, particularly in the form of guidanceand user control. We believe that data mining is mosteffective when the computer does what it does best—like searching large databases or counting—and usersdo what they do best, like specifying the current min-ing session’s focus. This division of labor is bestachieved through constraint-based mining, in whichthe user provides restraints that guide a search.1Mining can also be improved by employing a multi-dimensional, hierarchical view of the data. Currentdata warehouse systems have provided a fertileground for systematic development of this multidi-mensional mining.2Together, constraint-based andmultidimensional techniques can provide a more adhoc, query-driven process that effectively exploits thesemantics of data than those supported by currentstand-alone data-mining systems.AD HOC AND QUERY DRIVENAn ad hoc and query-driven data-mining system canbe more effective because it better fits queries to theuser’s intentions. It can make the process of inferringknowledge more efficient by letting a query optimizerdeliver high-performance, interactive mining systemsthat encourage exploratory mining and analysis.Such a data-mining system incorporates two capa-bilities, which also distinguish it from a statistical-analysis program or a machine-learning system.3First,it should offer an ad hoc mining query language,which is a high-level declarative language comparableto the Structured Query Language (SQL) for relationaldatabase management systems. Such a declarativemining language lets users express• the part of the database to be mined (called theminable view1), • the type of pattern/rule to be mined, and • the properties that the patterns should satisfy.These patterns should include not only numerical con-straints on statistical properties (like support, confi-dence, and correlation), but also those based onattribute domains, classes, and aggregates,1such as“I.type = ‘snacks’ and avg(I.price) < 100.”Second, a data-mining system should support effi-cient processing and optimization of mining queriesby providing a sophisticated mining-query optimizer.Such an optimizer exploits the various constraintsstated in the user-specified mining query and theirproperties to generate access plans that guarantee alevel of performance commensurate with the con-straints in the query. CONSTRAINTS: ESSENTIALS FOR AD HOC DATA MININGWe divide constraints into five categories:• Knowledge type constraints specify the type ofknowledge to be mined, such as concept descrip-tion, association, classification, prediction, clus-tering, or anomaly. This constraint, unlike otherconstraints, is usually specified at the beginningof a query because different types of knowledgecan require different constraints at later stages.• Data constraints specify the set of data relevant tothe mining task. We often specify such constraintsin a form similar to that of an SQL query andprocess them in query processing.• Dimension/level constraints confine the dimen-sion(s) or level(s) of data to be examined in a data-base or a data warehouse. Such constraints followthe model of a multidimensional database anddemonstrate the spirit of multidimensional min-ing. Thus, multidimensional mining can besmoothly incorporated within the framework ofconstraint-based mining.Integrating both constraint-based and multidimensional mining into oneframework provides an interactive, exploratory environment for effectiveand efficient data analysis and mining.Constraint-Based,Multidimensional Data MiningJiawei HanSimon FraserUniversityLaks V.S.LakshmananConcordiaUniversityand IndianInstitute ofTechnology,BombayRaymond T.NgUniversity ofBritishColombiaCover Feature• Rule constraints specify concrete constraints onthe rules to be mined.• Interestingness constraints specify what rangesof a measure associated with discovered patternsare useful or interesting from a statistical pointof view.The following example illustrates these five constraintsat work. Suppose there is a sales multidimensionaldatabase with four interrelated relations• sales (customer_name, item_name, transaction_id),• lives (customer_name, district, city),• item (item_name, category, price), and• transaction (transaction_id, day, month, year),where lives, item, and transaction are three dimensiontables. These tables are linked to the sales table viathree keys: customer_name, item_name, and transac-tion_id.“Find the sales of what cheap items (with the sumof the prices less than $100) that may promote thesales of what expensive items (with the minimum priceof $500) in the same category for Vancouver cus-tomers in 1998” is an association mining query. It isexpressed in a data mining query language (DMQL1)as shown in Figure 1a.This mining query may allow the generation ofassociation rules like those shown in Figure 1b.The rules mean that if a customer in Vancouverbought Census_CD and MS Office 97, there is a 68percent probability that he will also buy MS SQLServer. The rule further indicates that 1.5 percent of allthe customers fulfilled all the criteria.In this query, the knowledge type constraint is asso-ciation. The data constraint is lives(C, _ ,“Vancouver”).The dimensions are related to all three dimensions:lives, item, and transaction because the query involvesall of them. The levels are more confined. For lives, we only con-sider customer_name since city = “Vancouver” is usedonly in the selection; for item, we consider the levelsitem_name and category since they are used in thequery; and for transaction, we consider only transac-tion_id since day and month are not referenced andyear is used only in the selection. Rule constraintsinclude most portions of the where and having clauses,such as S.year = 1998, T.year = 1998, I.category = J.cat-egory, sum(I.price) $100, and min(J.price) ≥ 500.Finally, there are two interestingness constraints (thresh-olds), min_support = 0.01 and min_confidence = 0.5.Knowledge type constraints and data constraintscan be applied before data mining. That is, they arenot intertwined with the mining process itself. Afterapplying these constraints, a mining process may firstmine all of the possible rules before applying theremaining three categories of constraints and


View Full Document

USC CSCI 599 - DM3_computer99

Documents in this Course
Week8_1

Week8_1

22 pages

Week2_b

Week2_b

10 pages

LECT6BW

LECT6BW

20 pages

LECT6BW

LECT6BW

20 pages

5

5

44 pages

12

12

15 pages

16

16

20 pages

Nima

Nima

8 pages

Week1

Week1

38 pages

Week11_c

Week11_c

30 pages

afsin

afsin

5 pages

October5b

October5b

43 pages

Week11_2

Week11_2

20 pages

final

final

2 pages

c-4

c-4

12 pages

0420

0420

3 pages

Week9_b

Week9_b

20 pages

S7Kriegel

S7Kriegel

21 pages

Week4_2

Week4_2

16 pages

sandpres

sandpres

21 pages

Week6_1

Week6_1

20 pages

4

4

33 pages

Week10_c

Week10_c

13 pages

fft

fft

18 pages

LECT7BW

LECT7BW

19 pages

24

24

15 pages

14

14

35 pages

Week9_c

Week9_c

24 pages

Week11_67

Week11_67

22 pages

Week1

Week1

37 pages

LECT3BW

LECT3BW

28 pages

Week8_c2

Week8_c2

19 pages

Week5_1

Week5_1

19 pages

LECT5BW

LECT5BW

24 pages

Week10_b

Week10_b

16 pages

Week11_1

Week11_1

43 pages

Week7_2

Week7_2

15 pages

Week5_b

Week5_b

19 pages

Week11_a

Week11_a

29 pages

LECT14BW

LECT14BW

24 pages

T7kriegel

T7kriegel

21 pages

0413

0413

2 pages

3

3

23 pages

C2-TSE

C2-TSE

16 pages

10_19_99

10_19_99

12 pages

s1and2-v2

s1and2-v2

37 pages

Week10_3

Week10_3

23 pages

jalal

jalal

6 pages

1

1

25 pages

T3Querys

T3Querys

47 pages

CS17

CS17

15 pages

porkaew

porkaew

20 pages

LECT4BW

LECT4BW

21 pages

Week10_1

Week10_1

25 pages

wavelet

wavelet

17 pages

October5a

October5a

22 pages

p289-korn

p289-korn

12 pages

2

2

33 pages

rose

rose

36 pages

9_7_99

9_7_99

18 pages

Week10_2

Week10_2

28 pages

Week7_3

Week7_3

37 pages

Load more
Download DM3_computer99
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DM3_computer99 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DM3_computer99 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?