DOC PREVIEW
Pitt CS 2750 - Clustering

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 2750 Machine LearningCS 2750 Machine LearningLecture 17aMilos [email protected] Sennott SquareClusteringCS 2750 Machine LearningClusteringGroups together “similar” instances in the data sampleBasic clustering problem:• distribute data into k different groups such that data points similar to each other are in the same group • Similarity between data points is defined in terms of some distance metric (can be chosen)Clustering is useful for:• Similarity/Dissimilarity analysisAnalyze what data points in the sample are close to each other • Dimensionality reductionHigh dimensional data replaced with a group (cluster) label2CS 2750 Machine LearningClustering example• We see data points and want to partition them into the groups• Note that this is a problem different from density estimation !!!-3 -2 -1 0 1 2 3-3-2-10123CS 2750 Machine LearningClustering example• We see data points and want to partition them into the groups• Points close to each other (e.g. in terms of Euclidean distance)in the same group-3 -2 -1 0 1 2 3-3-2-101233CS 2750 Machine LearningClustering example• A set of patient cases • We want to partition them into the groups based on similaritiesPatient # Age Sex Heart Rate Blood pressure … Patient 1 55 M 85 125/80 Patient 2 62 M 87 130/85 Patient 3 67 F 80 126/86 Patient 4 65 F 90 130/90 Patient 5 70 M 84 135/85 CS 2750 Machine LearningClustering algorithmsPartitioning algorithms: • K-means algorithm – suitable only when data points have continuous values; groups are defined in terms of cluster centers (also called means).– refinement of the method to categorical values: K-medoids• Probabilistic methods (with EM)– Latent variable models: class (cluster) is represented by a latent (hidden) variable value. – Examples: mixture of Gaussians, Naïve Bayes with a hidden class• Hierarchical methods– Agglomerative– Divisive4CS 2750 Machine LearningK-meansK-Means algorithm:Initialize randomly k values of means (centers)Repeat two steps until no change in the means:– Partition the data according to the current set of means (using the similarity measure)– Move the means to the center of the data in the current partitionStop when no change in the meansProperties: • Minimizes the sum of squared center-point distances for all clusters • The algorithm always converges (local optima). CS 2750 Machine LearningK-Means example-3 -2 -1 0 1 2 3-3-2-10123-3 -2 -1 0 1 2 3-3-2-10123-3 -2 -1 0 1 2 3-3-2-101235CS 2750 Machine LearningK-means algorithm• Properties:– converges to centers minimizing the sum of squared center-point distances (still local optima) – The result is sensitive to the initial means’ values• Advantages:– Simplicity– Generality – can work for more than one distance measure• Drawbacks:– Can perform poorly with overlapping regions– Lack of robustness to outliers– Good for attributes (features) with continuous values• Allows us to compute cluster meansCS 2750 Machine LearningProbabilistic (EM-based) algorithms• Latent variable modelsExamples: Naïve Bayes with hidden classMixture of Gaussians• Partitioning: – the data point belongs to the class with the highest posterior• Advantages:– Good performance on overlapping regions– Robustness to outliers– Data attributes can have different types of values• Drawbacks:– EM is computationally expensive and can take time to converge– Density model should be given in advance6CS 2750 Machine LearningHierarchical clustering. Uses an arbitrary similarity/dissimilarity measure.Typical similarity measures d(a,b) :Pure real-valued data-points:– Euclidean, Manhattan, Minkowski distancesPure binary values data:– Number of matching values Pure categorical data:– Number of matching values Combination of real-valued and categorical attributes– A weighted sum approachCS 2750 Machine LearningHierarchical clustering. Approach:• Compute dissimilarity matrix for all pairs of points – uses standard or other distance measures• Construct clusters greedily:– Agglomerative approach• Merge pair of clusters in a bottom-up fashion, starting from singleton clusters– Divisive approach: • Splits clusters in top-down fashion, starting from one complete cluster• Stop the greedy construction when some criterion is satisfied– E.g. fixed number of clusters7CS 2750 Machine LearningCluster merging• Construction of clusters through greedy agglomerative approach– Merge pair of clusters in a bottom-up fashion, starting from singleton clusters– Merge clusters based on cluster distances. Defined in terms of point distances. Examples:||min),(,minqpCCdjiCqCpji−=∈∈||max),(,maxqpCCdjiCqCpji−=∈∈∑∑−=jjjiiijimeanqCpCCCd||1||1),(Min distanceMax distanceMean distanceCS 2750 Machine LearningHierarchical clustering example-1 0 1 2 3 4 5 6 7 8 9-202468108CS 2750 Machine LearningHierarchical clustering example 3 6 23 1 2 19 4 8 14 5 17 25 27 26 28 29 30 7 11 9 12 10 13 15 18 20 21 16 22 240123456-1 0 1 2 3 4 5 6 7 8 9-20246810• dendogramCS 2750 Machine LearningHierarchical clustering• Advantage:– Smaller computational cost; avoids scanning all possible clusterings•Disadvantage:– Greedy choice fixes the order in which clusters are merged; cannot be repaired–Partial solution: combine hierarchical clustering with iterative algorithms like k-means9CS 2750 Machine LearningCS 2750 Machine LearningLecture 17bMilos [email protected] Sennott SquareNon-parametric density estimation CS 2750 Machine LearningDensity estimation• Parametric density estimation method– the form of the density is known– We need to estimate a fixed set of parameters–Examples: Gaussian distribution, Exponential distribution•Non-parametric density estimation– The form of the density is not known– All examples are used in the estimate • every example acts as a parameter– The representation grows with the number of examples N–Examples: histogram, k-nearest neighbor10CS 2750 Machine LearningDensity estimations• Semi-parametric methods– A compromise between parametric and non-parametrictechniques– The form of the density is restricted but there is some


View Full Document

Pitt CS 2750 - Clustering

Download Clustering
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Clustering and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Clustering 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?