Hierarchical clusteringDavid M. BleiCOS424Princeton UniversityFebruary 28, 2008D. Blei Clustering 02 1 / 21Hierarchical clustering•Hierarchical clustering is a widely used data analysis tool.•The idea is to build a binary tree of the data that successivelymerges similar groups of points•Visualizing this tree provides a useful summary of the dataD. Blei Clustering 02 2 / 21Hierarchical clustering•Hierarchical clustering is a widely used data analysis tool.•The idea is to build a binary tree of the data that successivelymerges similar groups of points•Visualizing this tree provides a useful summary of the dataD. Blei Clustering 02 2 / 21Hierarchical clustering•Hierarchical clustering is a widely used data analysis tool.•The idea is to build a binary tree of the data that successivelymerges similar groups of points•Visualizing this tree provides a useful summary of the dataD. Blei Clustering 02 2 / 21Hierarchical clusering vs. k-means•Recall that k-means or k-medoids requires•A number of clusters k•An initial assignment of data to clusters•A distance measure between data d(xn, xm)•Hierarchical clustering only requires a measure of similarity betweengroups of data points.D. Blei Clustering 02 3 / 21Hierarchical clusering vs. k-means•Recall that k-means or k-medoids requires•A number of clusters k•An initial assignment of data to clusters•A distance measure between data d(xn, xm)•Hierarchical clustering only requires a measure of similarity betweengroups of data points.D. Blei Clustering 02 3 / 21Hierarchical clusering vs. k-means•Recall that k-means or k-medoids requires•A number of clusters k•An initial assignment of data to clusters•A distance measure between data d(xn, xm)•Hierarchical clustering only requires a measure of similarity betweengroups of data points.D. Blei Clustering 02 3 / 21Hierarchical clusering vs. k-means•Recall that k-means or k-medoids requires•A number of clusters k•An initial assignment of data to clusters•A distance measure between data d(xn, xm)•Hierarchical clustering only requires a measure of similarity betweengroups of data points.D. Blei Clustering 02 3 / 21Hierarchical clusering vs. k-means•Recall that k-means or k-medoids requires•A number of clusters k•An initial assignment of data to clusters•A distance measure between data d(xn, xm)•Hierarchical clustering only requires a measure of similarity betweengroups of data points.D. Blei Clustering 02 3 / 21Agglomerative clustering•We will talk about agglomerative clustering.•Algorithm:1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single clusterD. Blei Clustering 02 4 / 21Agglomerative clustering•We will talk about agglomerative clustering.•Algorithm:1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single clusterD. Blei Clustering 02 4 / 21Agglomerative clustering•We will talk about agglomerative clustering.•Algorithm:1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single clusterD. Blei Clustering 02 4 / 21Agglomerative clustering•We will talk about agglomerative clustering.•Algorithm:1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single clusterD. Blei Clustering 02 4 / 21Agglomerative clustering•We will talk about agglomerative clustering.•Algorithm:1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single clusterD. Blei Clustering 02 4 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80DataD. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 001V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 002V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 003V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 004V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 005V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 006V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 007V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 008V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 009V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 010V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 011V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 012V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 013V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 014V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 015V1V2D. Blei Clustering 02 5 / 21Example●●●●●●●●●●●●●●●●●●●●●●●●●0 20 40 60 80−20 0 20 40 60 80iteration 016V1V2D.
View Full Document