Unformatted text preview:

DATA MINING Spatial ClusteringNearest NeighborNearest Neighbor AlgorithmPAMSlide 5PAM Cost CalculationPAM AlgorithmBIRCHClustering FeatureBIRCH AlgorithmImprove ClustersDBSCANDBSCAN Density ConceptsDensity ConceptsDBSCAN AlgorithmCURECURE ApproachCURE AlgorithmCURE for Large DatabasesComparison of Clustering Techniques© Prentice Hall 1DATA MININGDATA MININGSpatial ClusteringSpatial ClusteringMargaret H. DunhamMargaret H. DunhamDepartment of Computer Science and EngineeringDepartment of Computer Science and EngineeringSouthern Methodist UniversitySouthern Methodist UniversityCompanion slides for the text by Dr. M.H.Dunham, Companion slides for the text by Dr. M.H.Dunham, Data Mining, Data Mining, Introductory and Advanced TopicsIntroductory and Advanced Topics, Prentice Hall, 2002., Prentice Hall, 2002.© Prentice Hall 2Nearest NeighborNearest NeighborItems are iteratively merged into the Items are iteratively merged into the existing clusters that are closest.existing clusters that are closest.IncrementalIncrementalThreshold, t, used to determine if items Threshold, t, used to determine if items are added to existing clusters or a new are added to existing clusters or a new cluster is created.cluster is created.© Prentice Hall 3Nearest Neighbor AlgorithmNearest Neighbor Algorithm© Prentice Hall 4PAMPAMPartitioning Around Medoids (PAM) Partitioning Around Medoids (PAM) (K-Medoids)(K-Medoids)Handles outliers well.Handles outliers well.Ordering of input does not impact results.Ordering of input does not impact results.Does not scale well.Does not scale well.Each cluster represented by one item, Each cluster represented by one item, called the called the medoid.medoid.Initial set of k medoids randomly chosen.Initial set of k medoids randomly chosen.© Prentice Hall 5PAMPAM© Prentice Hall 6PAM Cost CalculationPAM Cost CalculationAt each step in algorithm, medoids are At each step in algorithm, medoids are changed if the overall cost is improved.changed if the overall cost is improved.CCjihjih – cost change for an item t – cost change for an item tjj associated associated with swapping medoid twith swapping medoid tii with non-medoid t with non-medoid thh..© Prentice Hall 7PAM AlgorithmPAM Algorithm© Prentice Hall 8BIRCHBIRCHBalanced Iterative Reducing and Balanced Iterative Reducing and Clustering using HierarchiesClustering using HierarchiesIncremental, hierarchical, one scanIncremental, hierarchical, one scanSave clustering information in a tree Save clustering information in a tree Each entry in the tree contains Each entry in the tree contains information about one clusterinformation about one clusterNew nodes inserted in closest entry in New nodes inserted in closest entry in treetree© Prentice Hall 9Clustering FeatureClustering Feature CT Triple: (N,LS,SS)CT Triple: (N,LS,SS)–N: Number of points in clusterN: Number of points in cluster–LS: Sum of points in the clusterLS: Sum of points in the cluster–SS: Sum of squares of points in the clusterSS: Sum of squares of points in the clusterCF TreeCF Tree–Balanced search treeBalanced search tree–Node has CF triple for each childNode has CF triple for each child–Leaf node represents cluster and has CF value Leaf node represents cluster and has CF value for each subcluster in it.for each subcluster in it.–Subcluster has maximum diameterSubcluster has maximum diameter© Prentice Hall 10BIRCH AlgorithmBIRCH Algorithm© Prentice Hall 11Improve ClustersImprove Clusters© Prentice Hall 12DBSCANDBSCANDensity Based Spatial Clustering of Density Based Spatial Clustering of Applications with NoiseApplications with NoiseOutliers will not effect creation of cluster.Outliers will not effect creation of cluster.InputInput–MinPts MinPts – minimum number of points in – minimum number of points in clustercluster–EpsEps – for each point in cluster there must – for each point in cluster there must be another point in it less than this distance be another point in it less than this distance away.away.© Prentice Hall 13DBSCAN Density ConceptsDBSCAN Density ConceptsEps-neighborhood:Eps-neighborhood: Points within Eps distance Points within Eps distance of a point.of a point.Core point:Core point: Eps-neighborhood dense enough Eps-neighborhood dense enough (MinPts)(MinPts)Directly density-reachable:Directly density-reachable: A point p is directly A point p is directly density-reachable from a point q if the distance density-reachable from a point q if the distance is small (Eps) and q is a core point.is small (Eps) and q is a core point.Density-reachable:Density-reachable: A point si density- A point si density-reachable form another point if there is a path reachable form another point if there is a path from one to the other consisting of only core from one to the other consisting of only core points.points.© Prentice Hall 14Density ConceptsDensity Concepts© Prentice Hall 15DBSCAN AlgorithmDBSCAN Algorithm© Prentice Hall 16CURECUREClustering Using RepresentativesClustering Using RepresentativesUse many points to represent a cluster Use many points to represent a cluster instead of only oneinstead of only onePoints will be well scatteredPoints will be well scattered© Prentice Hall 17CURE ApproachCURE Approach© Prentice Hall 18CURE AlgorithmCURE Algorithm© Prentice Hall 19CURE for Large DatabasesCURE for Large Databases© Prentice Hall 20Comparison of Clustering Comparison of Clustering


View Full Document

SMU CSE 8331 - Spatial Clustering

Download Spatial Clustering
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Spatial Clustering and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Spatial Clustering 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?