DOC PREVIEW
SJSU CS 157A - Data Mining

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Data MiningSlide 2Define Data MiningExamples of Data MiningClassificationExample of a ruleDecision Tree ClassifiersExample of Decision Tree ClassifiersSlide 9AssociationAssociation RulesAssociation Rule 2ClusteringTypes of ClusteringExample of agglomerative hierarchical clusteringOther types of miningExample of Text MiningExample of Data-visualizationReferencesData MiningBy: Thai Hoa Nguyen PhamData MiningDefine Data MiningClassificationAssociationClusteringDefine Data MiningAlso known as KDD (Knowledge-Discovery in Database).Data mining is the semiautomatic process of analyzing data to find useful patterns.Why semiautomatic? Manual preprocessing of data and postprocessing of data.Examples of Data MiningA simple example would be of a clothing retail store. A data mining system could be used to list the customers who often buy t-shirts during the Summer season. Another example would be of the urban legend of how Walmart used data mining to find a correlation between customers buying beer and baby diapers. So they put the two aisles close together to increase profits.ClassificationIf it is given that items in databases are put into classes, a problem arises when a new item wants to be added to the database. The class for the new item is unknown, so other methods have to be used to find the right class for the item to be put in. Rules then come in to solve the problems.Example of a ruleP, P.degree = masters and P.income > 75,000 => P.credit = excellentP, P.degree = bachelors and P.income < 50K => P.credit = badDecision Tree ClassifiersWidely used technique for classification.Internal nodes either called functions or predicatesLeaf nodes are associated classes.Example of Decision Tree ClassifiersFunctions Classes Root Example of Decision Tree ClassifiersInternal nodes or functions are inside the boxes—degree (root) and income.Leaf nodes or associated classes are the four different circles—bad, average, good, excellent.Association An example of an association for beer and diapers would be: Beer => DiapersAs already mentioned, the above association just means that customers that buy beer often buy diapers, too.Association RulesSupport—is a measure of what fraction of the population satisfies both the antecedent and the consequent. In other words, in the association below: milk => screwdrivers Higher percentage of the above association happening is worth more attention than lower percentage.Association Rule 2Confidence– The measure of how often the consequent is true when the antecedent is true. bread = > milk For example, if the association above had a confidence of 50 percent, it just means that 50 percent of the purchases include bread and milk, but it leaves room for other items purchased with the bread.ClusteringClustering refers to finding clusters of points in a given data and grouping them in different subsets. Widely used clustering techniques—Hierarchical clustering, agglomerative clustering, and divisive clustering.Types of ClusteringHierarchical—clustering that deals with grouping things by importance. Agglomerative—start by building small clusters, then progressively merge into larger clusters.Decisive—begins with whole set and successively divides into smaller clusters.Example of agglomerative hierarchical clusteringAn example of a agglomerative clustering, where we have separate elements of a set merging with each internal node until the last merge “abcdef” is achieved.Other types of miningText Mining– data mining techniques to textual documents. An example would be how there is a tool to form clusters on pages that users have visited. So if a user supplies a site and defines that he/she wants a site containing the keyword “Japan”, a list of sites that used the keyword “Japan” the most will appear. Data Visualization—helps users to examine large volumes of data, and to detect patterns visually. So instead of seeing problems through text, visual displays can use maps and charts to pinpoint where the problem is with some color coding scheme.Example of Text MiningThis example shows what happens when a user does a search for “Japan”. The points closer to the center of the circle has more information on Japan. We can think of the points as websites or research articles.Example of Data-visualizationWe could say a number of things for this example. We could say the map depicts poverty levels or which state grows more apples.ReferencesData mining. (2006, October 27). In Wikipedia, The Free Encyclopedia. Retrieved 05:59, October 30, 2006, from http://en.wikipedia.org/w/index.php?title=Data_mining&oldid=84059363 Data clustering. (2006, October 29). In Wikipedia, The Free Encyclopedia. Retrieved 06:03, October 30, 2006, from http://en.wikipedia.org/w/index.php?title=Data_clustering&oldid=84478616 GISmatters (2004-2006) Retrived on October 31, 2006, from http://www.gismatters.com/over65.html Martin, G., Spath, J. (2000) Kryptasthesie. Retrieved on October 31, 2006 from http://www.projekttriangle.com/work/work_rwe.htm?research Silberschaz, A., Korth, H., Sudarshan, S. (2002). Database System Concepts. New York: New


View Full Document

SJSU CS 157A - Data Mining

Documents in this Course
SQL

SQL

18 pages

Lecture

Lecture

44 pages

Chapter 1

Chapter 1

56 pages

E-R Model

E-R Model

16 pages

Lecture

Lecture

48 pages

SQL

SQL

15 pages

SQL

SQL

26 pages

Lossless

Lossless

26 pages

SQL

SQL

16 pages

Final 3

Final 3

90 pages

Lecture 3

Lecture 3

22 pages

SQL

SQL

25 pages

Load more
Download Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?