Unsupervised Learning A gentle introduction to clustering analysis Pengyu Hong How Many Different Types of Shapes 1 Features Extract features to represent objects Corner Perimeter Area Noise 2 Data Distribution 100 60 shape 1 shape 2 shape 3 shape 4 shape 5 90 80 70 shape 1 shape 2 shape 3 shape 4 shape 5 50 40 60 50 30 40 20 30 20 10 10 0 0 1 2 3 4 5 6 0 0 45 0 5 0 55 Number of Corners 0 6 0 65 0 7 0 75 0 8 0 85 0 9 Perimeter Area Data Distribution 60 100 shape shape shape shape shape 90 80 70 1 2 3 4 5 shape 1 shape 2 shape 3 shape 4 shape 5 50 40 60 30 50 40 20 30 10 20 10 0 0 0 45 0 1 2 3 4 5 0 5 0 55 0 6 0 65 0 7 0 75 0 8 0 85 0 9 6 Perimeter Area Number of Corners 7 Number of Corners shape 1 shape 2 shape 3 shape 4 shape 5 6 5 4 3 2 1 0 1 0 4 0 5 0 6 0 7 0 8 0 9 1 Perimeter Area 3 Clustering Analysis Clustering is one of the most important unsupervised learning processes that organizing objects into groups whose members are similar in some way Clustering finds structures in a collection of unlabeled data A cluster is a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters Key Components of Clustering Analysis Data representation r num corner x perimeter area Similarity measurement Clustering method 4 Similarity Measurements Pearson Correlation x1 y1 r r Two profiles vectors x M and y M xp yp r r C pearson x y p i 1 xi mx yi my i 1 xi mx 2 i 1 yi my 2 p r x r y mx 1 N xn N n 1 my 1 N yn N n 1 p 1 Pearson Correlation 1 r x r y Similarity Measurements Pearson Correlation Trend Similarity r r b 0 5a r r c a 0 2 r r C pearson a b 1 r r C pearson a c 1 r r C pearson b c 1 5 Similarity Measurements Euclidean Distance r r d x y x1 r x M xp p n 1 y1 r y M yp xn yn 2 Similarity Measurements Euclidean Distance Absolute difference r r b 0 5a r r c a 0 2 r r d a b 2 8025 r r d a c 1 5875 r r d b c 3 2211 6 Similarity Measurements Cosine Correlation r r i 1 xi yi Ccosine x y r r x y p x1 r x M xp y1 r y M yp the angle between two vectors r r x y 1 Cosine Correlation 1 r r x y Similarity Measurements Cosine Correlation Trend Mean Distance r r b 0 5a r r c a 0 2 r r Ccos ine a b 1 r r Ccos ine a c 0 9622 r r Ccos ine b c 0 9622 7 Similarity Measurements r r b 0 5a r r c a 0 2 r r C pearson a b 1 r r C pearson a c 1 r r C pearson b c 1 r r d a b 2 8025 r r d a c 1 5875 r r d b c 3 2211 r r Ccos ine a b 1 r r Ccos ine a c 0 9622 r r Ccos ine b c 0 9622 Similarity Measurements Similar r r C pearson a b 0 1175 r r C pearson a c 0 1244 r r C pearson b c 0 1779 r r d a b 0 0279 r r d a c 0 0255 r r d b c 0 0236 r r Ccos ine a b 0 7544 r r Ccos ine a c 0 8092 r r Ccos ine b c 0 844 8 Hierarchical Clustering r r r r r r r r x1 x2 x3 x4 x5 x6 x7 x8 r x1 r x2 r x3 r x4 r x5 r x6 r x7 r x8 Dendrogram Hierarchical Clustering Cont Multilevel clustering level 1 has n clusters level n has one cluster Agglomerative HC starts with singleton and merge clusters Divisive HC starts with one sample and split clusters 9 r r r r r r r r x1 x2 x3 x4 x5 x6 x7 x8 Merge which pair of clusters Nearest Neighbor Method Starts with n nodes n is the size of our sample merges the 2 most similar nodes at each step and stops when the desired number of clusters is reached Single Linkage C2 Dissimilarity between two clusters Minimum dissimilarity between the members of two clusters C1 10 Complete Linkage C2 Dissimilarity between two clusters Maximum dissimilarity between the members of two clusters C1 Average Linkage C2 Dissimilarity between two clusters Averaged distances of all pairs of objects one from each cluster C1 11 Centroid Average Group Linkage Dissimilarity between two clusters Distance between two cluster means C2 C1 Matlab Demo Clustering 12 Clustering Result Classifier 7 shape 1 shape 2 shape 3 shape 4 shape 5 6 5 4 3 2 1 P shape num corner perimeter area 0 P shape num corner perimeter area P num corner perimeter area P shape num corner perimeter area shape P shape num corner perimeter area P num corner perimeter area shape P shape shape P shape num corner perimeter area 1 0 4 0 5 0 6 0 7 0 8 0 9 1 P num corner shape P perimeter area shape P shape P num corner shape P perimeter area shape P shape shape Clustering Result Classifier P shape num corner perimeter area P num corner shape P perimeter area shape P shape shape P num corner shape P perimeter area shape P shape P shape P num corner shape P perimeter area shape 60 100 shape shape shape shape shape 90 80 70 1 2 3 4 5 shape 1 shape 2 shape 3 shape 4 shape 5 50 40 60 30 50 40 20 30 10 20 10 0 0 0 45 0 1 2 3 4 5 0 5 0 55 0 6 0 65 0 7 0 75 0 8 0 85 0 9 6 13
View Full Document
Unlocking...