Unsupervised ClusteringUnsupervised ClusteringPR , ANN, & ML2Unsupervised Clustering Training samples are not labeled May not know how many classes a prior probability state-conditional probability Automatic discovery of structures Intuitively, objects in the same class stick together and form clustersPi()ϖpxi(|)ϖPR , ANN, & ML3 Locating groups (clusters) having similar measurementsGivenxxxunlabelledpartition in to C clustersnCℵ=ℵ ℵ ℵ{,,...,}(), ,...,1 212Unsupervised Clustering (cont)PR , ANN, & ML4Similarity Measure Need a similarity measurement s(x,x’) (e.g., distance between x and x’ )ddddq| dtqqkkdkyxyxyxyxyxyxΣyxyxyxyx⋅=⋅=−−=≥−=−=∑),(features)binary (for y Commonalit),(productinner Normalized)()(),(Distance sMahalanobi1,)|(),(Metric Minkowski111PR , ANN, & ML5Similarity Measure (cont.) How to set the threshold? Too large all samples assigned into a single class Too small each sample in its own class How to properly weight each feature? How do brightness of a fish and length of a fish relate? How to scale (or normalize) different measures?PR , ANN, & ML6Axis ScalingPR , ANN, & ML7Axis Scaling (cont.)PR , ANN, & ML8PR , ANN, & ML9ThresholdPR , ANN, & ML10Criteria for Clustering A criterion function for clustering∑∑∑ℵ∈= ℵ∈=−=iixiici xinJ xmmx1||12∑ ∑ ∑= ℵ∈ ℵ∉−=ci x xii inJ1 '2|'|1xxPR , ANN, & ML11 Within and Between group variance minimize and maximize the trace and determinant of the appropriate scatter matrix trace: square of the scattering radius determinant: square of the scattering volume∑∑∑∑∑ℵ∈=ℵ∈= ℵ∈=−−==−−=xcitiBxiici xtWnnniixmmmmmSxmmxmxSiiii1))((1))((11Criteria for Clustering (cont.)PR , ANN, & ML12Pathology: when class sizes differ∑∑∑ℵ∈= ℵ∈=−=iixiici xinJ xmmx1||12 Relaxed constraint Allow the large class to grow into smaller class Stringent constraint Disallow large class and split itPR , ANN, & ML13K-Means Algorithm(fixed # of clusters) Arbitrarily pick N cluster centers, assign samples to nearest center Compute sample mean of each cluster Reassign samples to clusters with the nearest mean (for all samples) Repeat if there are changes, otherwise stopPR , ANN, & ML14PR , ANN, & ML15PR , ANN, & ML16PR , ANN, & ML17K-Means In some sense, it is a simplified version of the case II of learning parametric form ∑==cjjjjkiiikkiwPwxpwPwxpxwP1)(),|()(),|(),|())))))θθθ In stead of multiple membership, give the sample to whichever class whose mean is closest to it =otherwisexwPki0mean classclosest with class1),|(θ))PR , ANN, & ML18Fuzzy K-Means In Matlab, you can find k-mean (it is called c-mean) under fuzzy logic toolbox It implements fuzzy k-mean Where membership function is not 1 or 0, but can be fractionalPR , ANN, & ML19Iterative K-means Iterative refinement of clusters to minimize Each step: move a sample from one to another∑∑ ∑ℵ∈= ℵ∈=−=iixiici xinJ xmmx1||122222ji|ˆ|1|ˆ|1|ˆ|11ˆ|ˆ|11ˆ to from xˆjjjiiijjjjjjjjjiiiiiiiiinnnnnnJJnnnJJnmxmxmxmxmmmxmxmm−+>−−−++=+−+=∗−−−=−−−=∗ℵℵ∗∗PR , ANN, & ML201. Select an initial partition of the n samples into clusters and compute 2. Select the next candidate sample3. If the current cluster is larger than 1, then4. Transfer 5. Update 6. If J has not changed in n attempts, stop, otherwise go back to 2. $x≠−+=−−=ijnnijnnjjjjjjj22|ˆ|1|ˆ|1mxmxρ$xtoifforalljk k jℵ≤ρρJmmi k,,Jmmc,,...,1PR , ANN, & ML21Hierarchical Clustering K-means assume a “flat” data description Hierarchical descriptions are more frequentlyPR , ANN, & ML22Hierarchical clustering (cont.) Samples in the same cluster will remain so at a higher levelx1x2x3x4x5x612345levelPR , ANN, & ML23 Bottom-up: agglomerate Top-down: divisive.to.Gobycdecrement,anddelete,and.Mergeandclusters,distinctofpairnearestthe.Find stopc,c.If },...,x{xand n c.Letjjijin251ˆ43ˆ2ˆ11ℵℵℵℵℵ≤=ℵ=Hierarchical clustering OptionsPR , ANN, & ML24 At a certain step, we have c classes∑ℵ∈==ℵixiiiinn xm1,||Merge two clusters i and j)(11,||jjiijixjiijjiijjiijnnnnxnnmnnijmm ++=+=+=ℵℵ∪ℵ=ℵ∑ℵ∈Hierarchical clustering ProcedurePR , ANN, & ML25 Then certain criteria function will increase 2222222||)(||||||jijijiijjijjiixjxixijijnnnnnnnnEjiijmmmmmmxmxmx−+=+−+=−−−−−=∆∑∑∑ℵ∈ℵ∈ℵ∈),(minarg||minarg,,2,**jijijijijijidnnnnji ℵℵ=−+>=< mmI*,j* chosen in such a way Until no such pair exist that cause an increase in the criteria function less than certain preset thresholdHierarchical clustering Procedure (cont.)PR , ANN, & ML26 In fact, the criteria function can be chosen in many different ways, resulting in different clustering behaviors)(max),(classes)(compact neighbor Furthest)(min),(classes) (elongatedneighbor Nearest,max,minyx,yx,jijiyxjiyxjiddℵ∈ℵ∈ℵ∈ℵ∈=ℵℵ−=ℵℵ−Criteria FunctionPR , ANN, & ML27•Starting from every sample in a cluster•Merge them according to some criteria function•Until two clusters existPR , ANN, & ML28PR , ANN, & ML29PR , ANN, & ML30In Reality With n objects, the distance matrix is n x n For large databases, it is computationally expensive to compute and store the matrix Solution: storing only k nearest clusters in the distance matrix: complexity is k x nPR , ANN, & ML31Divisive Clustering Less often used Have to be careful that criterion function usually decreases monotonically (the samples become purer each time) Natural grouping is the one where a large drop in impurity occursiterationimpurityPR , ANN, & ML32ISODATA Algorithm A tried-and-true clustering algorithm Dynamically update # of clusters Can go both top-down (split) and bottom-up (merge) directionsPR , ANN, & ML33Notation:merged be can that clusters ofnumber maximummergingfor distance maximumsplittingfor spread maximumclusters ofnumber (desired) eapproximatclustera in samples ofnumber on thresholdmaxNDNTmsDσPR , ANN, & ML34Algorithm 1. Cluster the existing data into Nc clusters but eliminate any data and classes with fewer than T members, decrease Nc. Exit when classification of samples has not changed 2. If iteration odd and Split clusters whose samples are sufficiently disjoint, increase Nc If any clusters have been split, go to
View Full Document