SBU CSE 634 - Data Mining Primitives, Languages and System Architecture

Unformatted text preview:

Data Mining Primitives, Languages and System ArchitectureSources/ReferencesContentIntroductionWhat is Data Mining???Architecture of a typical data mining systemSlide 7Data Mining PrimitivesTask relevant dataExampleKind of knowledge to be minedSlide 12Background knowledgeConcept hierarchies (1)Slide 15Concept hierarchies (2)Schema hierarchiesSet-grouping hierarchiesOperation-derived hierarchiesRule-based hierarchiesInterestingness measure (1)Interestingness measure (2)Interestingness measure (3)Presentation and visualizationData mining query languagesData mining query languages (2)DMQLDMQL-Syntax for task-relevant data specificationExampleSyntax for Kind of Knowledge to be MinedSyntax for Kind of Knowledge to be Mined (2)Syntax for concept hierarchy specificationSyntax for concept hierarchy specification (2)Syntax for interestingness measure specificationSyntax for pattern presentation and visualization specificationArchitectures of Data Mining SystemCont.Paper DiscussionAbstractLydia Text Analysis SystemBlock Diagram of Lydia SystemProcess InvolvedNews Analysis with LydiaJuxtaposition AnalysisCont.Spatial AnalysisSlide 47Temporal AnalysisHow the paper is related to DM?ApplicationWhat is GIS???How does a GIS work?Geological Survey (USGS) Digital Line Graph (DLG) of roads.Digital Line Graph of rivers.Data captureData integrationMapmakingWhat is special about GIS??Cont.Cont.Data OutputThe future of GISHow is it related to DM?Slide 64Data Mining Primitives, Languages and System ArchitectureCSE 634-Datamining Concepts and TechniquesProfessor Anita WasilewskaPresented BySushma Devendrappa - 105526184Swathi Kothapalli - 105531380Sources/ReferencesData Mining Concepts and Techniques –Jiawei Han and Micheline Kamber, 2003Handbook of Data Mining and Discovery- Willi Klosgen and Jan M Zytkow, 2002Lydia: A System for Large-Scale News Analysis- String Processing and Information Retrieval: 12th International Conference, SPRING 2005, Buenos Aires, Argentina, November 2-4 2005.Information Retrieval: Data Structures and Algorithms - W. Frakes and R. Baeza-Yates, 1992Geographical Information System - http://erg.usgs.gov/isb/pubs/gis_poster/Content Data mining primitivesLanguagesSystem architectureApplication – Geographical information system (GIS)Paper - Lydia: A System for Large-Scale News AnalysisIntroductionMotivation- need to extract useful information and knowledge from a large amount of data (data explosion problem)Data Mining tools perform data analysis and may uncover important data patterns, contributing greatly to business strategies, knowledge bases, and scientific and medical research.What is Data Mining???Data mining refers to extracting or “mining” knowledge from large amounts of data. Also referred as Knowledge Discovery in Databases.It is a process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses, or other information repositories.Architecture of a typical data mining systemGraphical user interfacePattern evaluationData mining engineDatabase or data warehouse serverDatabaseData warehouseKnowledge baseFilteringData cleansingData IntegrationMisconception: Data mining systems can autonomously dig out all of the valuable knowledge from a given large database, without human intervention. If there was no user intervention then the system would uncover a large set of patterns that may even surpass the size of the database. Hence, user interference is required.This user communication with the system is provided by using a set of data mining primitives.Data Mining PrimitivesData mining primitives define a data mining task, which can be specified in the form of a data mining query.Task Relevant DataKinds of knowledge to be minedBackground knowledgeInterestingness measurePresentation and visualization of discovered patternsTask relevant dataData portion to be investigated.Attributes of interest (relevant attributes) can be specified.Initial data relationMinable viewExampleIf a data mining task is to study associations between items frequently purchased at AllElectronics by customers in Canada, the task relevant data can be specified by providing the following information:Name of the database or data warehouse to be used (e.g., AllElectronics_db)Names of the tables or data cubes containing relevant data (e.g., item, customer, purchases and items_sold)Conditions for selecting the relevant data (e.g., retrieve data pertaining to purchases made in Canada for the current year)The relevant attributes or dimensions (e.g., name and price from the item table and income and age from the customer table)Kind of knowledge to be minedIt is important to specify the knowledge to be mined, as this determines the data mining function to be performed.Kinds of knowledge include concept description, association, classification, prediction and clustering.User can also provide pattern templates. Also called metapatterns or metarules or metaqueries.ExampleA user studying the buying habits of allelectronics customers may choose to mine association rules of the form:P (X:customer,W) ^ Q (X,Y) => buys (X,Z)Meta rules such as the following can be specified:age (X, “30…..39”) ^ income (X, “40k….49K”) => buys (X, “VCR”)[2.2%, 60%]occupation (X, “student ”) ^ age (X, “20…..29”)=> buys (X, “computer”)[1.4%, 70%]Background knowledgeIt is the information about the domain to be minedConcept hierarchy: is a powerful form of background knowledge. Four major types of concept hierarchies:schema hierarchiesset-grouping hierarchiesoperation-derived hierarchiesrule-based hierarchiesConcept hierarchies (1)Defines a sequence of mappings from a set of low-level concepts to higher-level (more general) concepts.Allows data to be mined at multiple levels of abstraction.These allow users to view data from different perspectives, allowing further insight into the relationships.Example (location)allCanadaUSABritish ColumbiaOntarioVictoriaVancouver Toronto OttawaNew YorkIllinoisNew York Buffalo ChicagoLevel 0Level 3Level 2Level 1ExampleConcept hierarchies (2)Rolling Up - Generalization of dataAllows to view data at more meaningful and explicit abstractions.Makes it easier to understandCompresses the dataWould require fewer input/output operationsDrilling Down - Specialization of dataConcept values replaced by lower


View Full Document

SBU CSE 634 - Data Mining Primitives, Languages and System Architecture

Download Data Mining Primitives, Languages and System Architecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Mining Primitives, Languages and System Architecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Mining Primitives, Languages and System Architecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?