FSU CIS 5930r - Cluster Analysis - D1829430

Home> Schools> Florida State University> Computer Science (CIS) > CIS 5930r> Cluster Analysis

DOC PREVIEW

FSU CIS 5930r - Cluster Analysis

School name Florida State University

Course Cis 5930r- Selected Topics in Computer Science (13).

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Cluster AnalysisCluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis SummaryHierarchical Clustering Use distance matrix as clustering criteria. This method does not require the number of clusters kas an input, but needs a termination condition Step 0Step 1 Step 2 Step 3 Step 4bdceaa bd ec d ea b c d eStep 4Step 3 Step 2 Step 1 Step 0agglomerative(AGNES)divisive(DIANA)AGNES (Agglomerative Nesting) Implemented in statistical analysis packages, e.g., Splus Use the Single-Link method and the dissimilarity matrix.  Merge objects that have the least dissimilarity Go on in a non-descending fashion Eventually all objects belong to the same cluster Single-Link: each time merge the clusters (C1,C2) which are connected by the shortest single link of objects, i.e., minpC1,qC2dist(p,q)0123456789100 1 2 3 4 5 6 7 8 9 100123456789100 1 2 3 4 5 6 7 8 9 100123456789100 1 2 3 4 5 6 7 8 9 10Decompose data objects into a several levels of nested partitioning (tree of clusters), called a dendrogram. A clustering of the data objects is obtained by cuttingthe dendrogram at the desired level, then each connected component forms a cluster.E.g., level 1 gives 4 clusters: {a,b},{c},{d},{e},level 2 gives 3 clusters: {a,b},{c},{d,e}level 3 gives 2 clusters: {a,b},{c,d,e}, etc.A Dendrogram Shows How the Clusters are Merged Hierarchicallya b c d eabdeclevel 1level 2level 3level 4DIANA (Divisive Analysis) Implemented in statistical analysis packages, e.g., Splus Inverse order of AGNES Eventually each node forms a cluster on its own0123456789100 1 2 3 4 5 6 7 8 9 100123456789100 1 2 3 4 5 6 7 8 9 100123456789100 1 2 3 4 5 6 7 8 9 10More on Hierarchical Clustering Methods Major weakness of agglomerative clustering methods do not scale well: time complexity of at least O(n2), where n is the number of total objects can never undo what was done previously Integration of hierarchical with distance-based clustering BIRCH (1996): uses CF-tree and incrementally adjusts the quality of sub-clusters CURE (1998): selects well-scattered points from the cluster and then shrinks them towards the center of the cluster by a specified fraction CHAMELEON (1999): hierarchical clustering using dynamic modelingBIRCH (1996) Birch: Balanced Iterative Reducing and Clustering using Hierarchies, by Zhang, Ramakrishnan, Livny (SIGMOD’96) Incrementally construct a CF (Clustering Feature) tree, a hierarchical data structure for multiphase clustering Phase 1: scan DB to build an initial in-memory CF tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data)  Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree  Scales linearly: finds a good clustering with a single scan and improves the quality with a few additional scans Weakness: handles only numeric data, and sensitive to the order of the data records, no good if non-spherical clusters.Clustering Feature VectorClustering Feature: CF = (N, LS, SS)N: Number of data pointsLS: Ni=1 XiSS: Ni=1 (Xi )20123456789100 1 2 3 4 5 6 7 8 9 10CF = (5, (16,30),244)(3,4)(2,6)(4,5)(4,7)(3,8)Some Characteristics of CFVs Two CFVs can be aggregated. Given CF1=(N1, LS1, SS1), CF2 = (N2, LS2, SS2), If combined into one cluster, CF=(N1+N2, LS1+LS2, SS1+SS2). The centroid and radius can both be computed from CF. centroid is the center of the cluster radius is the average distance between an object and the centroid.Other statistical features as well...NNiixx10NRNiixx210)(CF-Tree in BIRCHA CF tree is a height-balanced tree that stores the clustering features for a hierarchical clustering  A nonleaf node in a tree has descendants or “children” The nonleaf nodes store sums of the CFs of their children A CF tree has two parameters Branching factor: specify the maximum number of children. threshold T: max radius of sub-clusters stored at the leaf nodesCF Tree (a multiway tree, like the B-tree)CF1child1CF3child3CF2child2CF6child6CF1child1CF3child3CF2child2CF5child5CF1CF2CF6prev nextCF1CF2CF4prev nextRootNon-leaf nodeLeaf node Leaf nodeCF-Tree Construction Scan through the database once. For each object, insert into the CF-tree as follows: At each level, choose the sub-tree whose centroid is closest. In a leaf page, choose a cluster that can absort it (new radius < T). If no cluster can absorb it, create a new cluster. Update upper

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 13 pages.

FSU CIS 5930r - Cluster Analysis

Sign up for free to view:

Please select your school