UH COSC 6340 - Chapter 8 Cluster Analysis - D2759930

Home> Schools> University of Houston> Computer Science (COSC) > COSC 6340> Chapter 8 Cluster Analysis

DOC PREVIEW

UH COSC 6340 - Chapter 8 Cluster Analysis

School name University of Houston

Course Cosc 6340- Database Systems

Pages 98

This preview shows page 1-2-3-4-5-6-7-46-47-48-49-50-51-92-93-94-95-96-97-98 out of 98 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 98 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 8 —Chapter 8. Cluster AnalysisGeneral Applications of ClusteringExamples of Clustering ApplicationsWhat Is Good Clustering?Requirements of Clustering in Data MiningSlide 8Data StructuresMeasure the Quality of ClusteringType of data in clustering analysisInterval-valued variablesSimilarity and Dissimilarity Between ObjectsSimilarity and Dissimilarity Between Objects (Cont.)Binary VariablesDissimilarity between Binary VariablesNominal VariablesOrdinal VariablesRatio-Scaled VariablesVariables of Mixed TypesSlide 21Major Clustering ApproachesSlide 23Partitioning Algorithms: Basic ConceptThe K-Means Clustering MethodSlide 26Comments on the K-Means MethodVariations of the K-Means MethodThe K-Medoids Clustering MethodPAM (Partitioning Around Medoids) (1987)PAM Clustering: Total swapping cost TCih=jCjihCLARA (Clustering Large Applications) (1990)CLARANS (“Randomized” CLARA) (1994)Slide 34Hierarchical ClusteringAGNES (Agglomerative Nesting)PowerPoint PresentationDIANA (Divisive Analysis)More on Hierarchical Clustering MethodsBIRCH (1996)Slide 41CF TreeCURE (Clustering Using REpresentatives )Drawbacks of Distance-Based MethodCure: The AlgorithmData Partitioning and ClusteringCure: Shrinking Representative PointsClustering Categorical Data: ROCKRock: AlgorithmCHAMELEONOverall Framework of CHAMELEONSlide 52Density-Based Clustering MethodsDensity-Based Clustering: BackgroundDensity-Based Clustering: Background (II)DBSCAN: Density Based Spatial Clustering of Applications with NoiseDBSCAN: The AlgorithmOPTICS: A Cluster-Ordering Method (1999)OPTICS: Some Extension from DBSCANSlide 60DENCLUE: using density functionsDenclue: Technical EssenceGradient: The steepness of a slopeDensity AttractorCenter-Defined and ArbitrarySlide 66Grid-Based Clustering MethodSTING: A Statistical Information Grid ApproachSTING: A Statistical Information Grid Approach (2)STING: A Statistical Information Grid Approach (3)WaveCluster (1998)Slide 73What Is Wavelet (2)?QuantizationTransformationSlide 77CLIQUE (Clustering In QUEst)CLIQUE: The Major StepsSlide 80Strength and Weakness of CLIQUESlide 82Model-Based Clustering MethodsCOBWEB Clustering MethodMore on Statistical-Based ClusteringOther Model-Based Clustering MethodsSlide 87Self-organizing feature maps (SOMs)Slide 89What Is Outlier Discovery?Outlier Discovery: Statistical ApproachesOutlier Discovery: Distance-Based ApproachOutlier Discovery: Deviation-Based ApproachSlide 94Problems and ChallengesConstraint-Based Clustering AnalysisSummaryReferences (1)References (2)http://www.cs.sfu.ca/~hanJanuary 15, 2019 Data Mining: Concepts and Techniques1Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 8 —©Jiawei Han and Micheline KamberIntelligent Database Systems Research LabSchool of Computing Science Simon Fraser University, Canadahttp://www.cs.sfu.caJanuary 15, 2019 Data Mining: Concepts and Techniques2Chapter 8. Cluster AnalysisWhat is Cluster Analysis?Types of Data in Cluster AnalysisA Categorization of Major Clustering MethodsPartitioning MethodsHierarchical MethodsDensity-Based MethodsGrid-Based MethodsModel-Based Clustering MethodsOutlier AnalysisSummaryJanuary 15, 2019 Data Mining: Concepts and Techniques4General Applications of Clustering Pattern RecognitionSpatial Data Analysis create thematic maps in GIS by clustering feature spacesdetect spatial clusters and explain them in spatial data miningImage ProcessingEconomic Science (especially market research)WWWDocument classificationCluster Weblog data to discover groups of similar access patternsJanuary 15, 2019 Data Mining: Concepts and Techniques5Examples of Clustering ApplicationsMarketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programsLand use: Identification of areas of similar land use in an earth observation databaseInsurance: Identifying groups of motor insurance policy holders with a high average claim costCity-planning: Identifying groups of houses according to their house type, value, and geographical locationEarth-quake studies: Observed earth quake epicenters should be clustered along continent faultsJanuary 15, 2019 Data Mining: Concepts and Techniques6What Is Good Clustering?A good clustering method will produce high quality clusters withhigh intra-class similaritylow inter-class similarity The quality of a clustering result depends on both the similarity measure used by the method and its implementation.The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns.January 15, 2019 Data Mining: Concepts and Techniques7Requirements of Clustering in Data Mining ScalabilityAbility to deal with different types of attributesDiscovery of clusters with arbitrary shapeMinimal requirements for domain knowledge to determine input parametersAble to deal with noise and outliersInsensitive to order of input recordsHigh dimensionalityIncorporation of user-specified constraintsInterpretability and usabilityJanuary 15, 2019 Data Mining: Concepts and Techniques8Chapter 8. Cluster AnalysisWhat is Cluster Analysis?Types of Data in Cluster AnalysisA Categorization of Major Clustering MethodsPartitioning MethodsHierarchical MethodsDensity-Based MethodsGrid-Based MethodsModel-Based Clustering MethodsOutlier AnalysisSummaryJanuary 15, 2019 Data Mining: Concepts and Techniques9Data StructuresData matrix(two modes)Dissimilarity matrix(one mode)npx...nfx...n1x...............ipx...ifx...i1x...............1px...1fx...11x0...)2,()1,(:::)2,3()...ndnd0dd(3,10d(2,1)0January 15, 2019 Data Mining: Concepts and Techniques10Measure the Quality of ClusteringDissimilarity/Similarity metric: Similarity is expressed in terms of a distance function, which is typically metric: d(i, j)There is a separate “quality” function that measures the “goodness” of a cluster.The definitions of distance functions are usually very different for interval-scaled, boolean, categorical, ordinal and ratio variables.Weights should be associated with different variables based on applications and data semantics.It is hard to define

View Full Document