Brandeis CS 101A - Unsupervised Learning

Unformatted text preview:

1Unsupervised Learning– A gentle introduction to clustering analysisPengyu HongHow Many Different Types of Shapes?2Features• Extract features to represent objects– Corner– Perimeter / Area•Noise3Data DistributionPerimeter / AreaNumber of Corners0 1 2 3 4 5 60102030405060708090100 shape 1shape 2shape 3shape 4shape 50.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.90102030405060 shape 1shape 2shape 3shape 4shape 5Data DistributionPerimeter / AreaNumber of CornersPerimeter / AreaNumber of Corners0.4 0.5 0.6 0.7 0.8 0.9 1-101234567 shape 1shape 2shape 3shape 4shape 50.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.90102030405060 shape 1shape 2shape 3shape 4shape 50 1 2 3 4 5 60102030405060708090100 shape 1shape 2shape 3shape 4shape 54Clustering Analysis• Clustering is one of the most important unsupervised learning processes that organizing objects into groups whose members are similar in some way.• Clustering finds structures in a collection of unlabeled data.•A cluster is a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. Key Components of Clustering Analysis• Similarity measurement• Clustering method• Data representation⎥⎦⎤⎢⎣⎡=areaperimetercornernumx/_r5Similarity Measurements• Pearson Correlation⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=pxxx Mr1Two profiles (vectors)and])(][)([))((),(12121∑∑∑===−−−−=piyipixipiyixipearsonmymxmymxyxCrr⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=pyyy Mr1xryrxryr+1 ≥ Pearson Correlation ≥ –1∑==NnnxxNm11∑==NnnyyNm11Similarity Measurements• Pearson Correlation: Trend Similarityabrr5.0=2.0−=acrr1),(=caCpearsonrr1),( =baCpearsonrr1),( =cbCpearsonrr6Similarity Measurements• Euclidean Distance∑=−=pnnnyxyxd12)(),(rr⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=pxxx Mr1⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=pyyy Mr1Similarity Measurements• Euclidean Distance: Absolute differenceabrr5.0=2.0−=acrr5875.1),(=cadrr8025.2),( =badrr2211.3),( =cbdrr7Similarity Measurements• Cosine CorrelationyxyxyxCpiiirrrr××=∑=1cosine),(yxrr=+1 ≥ Cosine Correlation ≥ –1yxrr−=⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=pxxx Mr1⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=pyyy Mr1the angle between two vectors Similarity Measurements• Cosine Correlation: Trend + Mean Distanceabrr5.0=2.0−=acrr1),(inecos=baCrr9622.0),(inecos=caCrr9622.0),(inecos=cbCrr8Similarity Measurementsabrr5.0=2.0−=acrr1),(inecos=baCrr9622.0),(inecos=caCrr9622.0),(inecos=cbCrr5875.1),(=cadrr8025.2),( =badrr2211.3),( =cbdrr1),( =caCpearsonrr1),( =baCpearsonrr1),( =cbCpearsonrrSimilarity Measurements7544.0),(inecos=baCrr8092.0),(inecos=caCrr844.0),(inecos=cbCrr0255.0),(=cadrr0279.0),( =badrr0236.0),( =cbdrr1244.0),( =caCpearsonrr1175.0),( −=baCpearsonrr1779.0),( =cbCpearsonrrSimilar?9Hierarchical ClusteringDendrogram1xr2xr3xr4xr5xr6xr7xr8xr2xr3xr4xr5xr8xr7xr6xr1xrHierarchical Clustering (Cont.)• Multilevel clustering: level 1 has n clusters Ælevel n has one cluster.• Agglomerative HC: starts with singleton and merge clusters.• Divisive HC: starts with one sample and split clusters.10Merge which pair of clusters?1xr2xr3xr4xr5xr6xr7xr8xrNearest Neighbor Method: Starts with n nodes (n is the size of our sample), merges the 2 most similar nodes at each step, and stops when the desired number of clusters is reached.++Single LinkageC1C2Dissimilarity between two clusters = Minimum dissimilarity between the members of two clusters11++Complete LinkageC1C2Dissimilarity between two clusters = Maximum dissimilarity between the members of two clusters ++Average LinkageC1C2Dissimilarity between two clusters = Averaged distances of all pairs of objects (one from each cluster).12++Centroid (Average Group) LinkageC1C2Dissimilarity between two clusters = Distance between two cluster means.Matlab Demo Clustering13Clustering Result Æ Classifier0.4 0.5 0.6 0.7 0.8 0.9 1-101234567 shape 1shape 2shape 3shape 4shape 5∑∑∑====shapeshapeshapeshapePshapeareaperimeterPshapecornernumPshapePshapeareaperimeterPshapecornernumPareaperimetercornernumshapePshapePshapeareaperimetercornernumPareaperimetercornernumshapePareaperimetercornernumshapePareaperimetercornernumPareaperimetercornernumshapePareaperimetercornernumshapeP)()|_()|_()()|_()|_()_,_,()()|_,_()_,_,()_,_,()_,_()_,_,()_,_|(Clustering Result Æ Classifier∑=shapeshapePshapeareaperimeterPshapecornernumPshapePshapeareaperimeterPshapecornernumPareaperimetercornernumshapeP)()|_()|_()()|_()|_()_,_|(0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.90102030405060 shape 1shape 2shape 3shape 4shape 50 1 2 3 4 5 60102030405060708090100 shape 1shape 2shape 3shape 4shape 5)|_( shapecornernumP)|_(


View Full Document

Brandeis CS 101A - Unsupervised Learning

Download Unsupervised Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Unsupervised Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Unsupervised Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?