DOC PREVIEW
UCSB CS 290 - Unsupervised Clustering

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Unsupervised ClusteringUnsupervised ClusteringPR , ANN, & ML2Unsupervised Clustering Training samples are not labeled May not know how many classes a prior probability  state-conditional probability Automatic discovery of structures  Intuitively, objects in the same class stick together and form clustersPi()ϖpxi(|)ϖPR , ANN, & ML3 Locating groups (clusters) having similar measurementsGivenxxxunlabelledpartition in to C clustersnCℵ=ℵ ℵ ℵ{,,...,}(), ,...,1 212Unsupervised Clustering (cont)PR , ANN, & ML4Similarity Measure Need a similarity measurement s(x,x’) (e.g., distance between x and x’ )ddddq| dtqqkkdkyxyxyxyxyxyxΣyxyxyxyx⋅=⋅=−−=≥−=−=∑),(features)binary (for y Commonalit),(productinner Normalized)()(),(Distance sMahalanobi1,)|(),(Metric Minkowski111PR , ANN, & ML5Similarity Measure (cont.) How to set the threshold?  Too large all samples assigned into a single class Too small each sample in its own class How to properly weight each feature? How do brightness of a fish and length of a fish relate? How to scale (or normalize) different measures?PR , ANN, & ML6Axis ScalingPR , ANN, & ML7Axis Scaling (cont.)PR , ANN, & ML8PR , ANN, & ML9ThresholdPR , ANN, & ML10Criteria for Clustering A criterion function for clustering∑∑∑ℵ∈= ℵ∈=−=iixiici xinJ xmmx1||12∑ ∑ ∑= ℵ∈ ℵ∉−=ci x xii inJ1 '2|'|1xxPR , ANN, & ML11 Within and Between group variance minimize and maximize the trace and determinant of the appropriate scatter matrix trace: square of the scattering radius determinant: square of the scattering volume∑∑∑∑∑ℵ∈=ℵ∈= ℵ∈=−−==−−=xcitiBxiici xtWnnniixmmmmmSxmmxmxSiiii1))((1))((11Criteria for Clustering (cont.)PR , ANN, & ML12Pathology: when class sizes differ∑∑∑ℵ∈= ℵ∈=−=iixiici xinJ xmmx1||12 Relaxed constraint Allow the large class to grow into smaller class Stringent constraint Disallow large class and split itPR , ANN, & ML13K-Means Algorithm(fixed # of clusters) Arbitrarily pick N cluster centers, assign samples to nearest center Compute sample mean of each cluster Reassign samples to clusters with the nearest mean (for all samples) Repeat if there are changes, otherwise stopPR , ANN, & ML14PR , ANN, & ML15PR , ANN, & ML16PR , ANN, & ML17K-Means In some sense, it is a simplified version of the case II of learning parametric form ∑==cjjjjkiiikkiwPwxpwPwxpxwP1)(),|()(),|(),|())))))θθθ In stead of multiple membership, give the sample to whichever class whose mean is closest to it =otherwisexwPki0mean classclosest with class1),|(θ))PR , ANN, & ML18Fuzzy K-Means In Matlab, you can find k-mean (it is called c-mean) under fuzzy logic toolbox It implements fuzzy k-mean  Where membership function is not 1 or 0, but can be fractionalPR , ANN, & ML19Iterative K-means Iterative refinement of clusters to minimize Each step: move a sample from one to another∑∑ ∑ℵ∈= ℵ∈=−=iixiici xinJ xmmx1||122222ji|ˆ|1|ˆ|1|ˆ|11ˆ|ˆ|11ˆ to from xˆjjjiiijjjjjjjjjiiiiiiiiinnnnnnJJnnnJJnmxmxmxmxmmmxmxmm−+>−−−++=+−+=∗−−−=−−−=∗ℵℵ∗∗PR , ANN, & ML201. Select an initial partition of the n samples into clusters and compute 2. Select the next candidate sample3. If the current cluster is larger than 1, then4. Transfer 5. Update 6. If J has not changed in n attempts, stop, otherwise go back to 2. $x≠−+=−−=ijnnijnnjjjjjjj22|ˆ|1|ˆ|1mxmxρ$xtoifforalljk k jℵ≤ρρJmmi k,,Jmmc,,...,1PR , ANN, & ML21Hierarchical Clustering K-means assume a “flat” data description Hierarchical descriptions are more frequentlyPR , ANN, & ML22Hierarchical clustering (cont.) Samples in the same cluster will remain so at a higher levelx1x2x3x4x5x612345levelPR , ANN, & ML23 Bottom-up: agglomerate Top-down: divisive.to.Gobycdecrement,anddelete,and.Mergeandclusters,distinctofpairnearestthe.Find stopc,c.If },...,x{xand n c.Letjjijin251ˆ43ˆ2ˆ11ℵℵℵℵℵ≤=ℵ=Hierarchical clustering OptionsPR , ANN, & ML24 At a certain step, we have c classes∑ℵ∈==ℵixiiiinn xm1,||Merge two clusters i and j)(11,||jjiijixjiijjiijjiijnnnnxnnmnnijmm ++=+=+=ℵℵ∪ℵ=ℵ∑ℵ∈Hierarchical clustering ProcedurePR , ANN, & ML25 Then certain criteria function will increase 2222222||)(||||||jijijiijjijjiixjxixijijnnnnnnnnEjiijmmmmmmxmxmx−+=+−+=−−−−−=∆∑∑∑ℵ∈ℵ∈ℵ∈),(minarg||minarg,,2,**jijijijijijidnnnnji ℵℵ=−+>=< mmI*,j* chosen in such a way Until no such pair exist that cause an increase in the criteria function less than certain preset thresholdHierarchical clustering Procedure (cont.)PR , ANN, & ML26 In fact, the criteria function can be chosen in many different ways, resulting in different clustering behaviors)(max),(classes)(compact neighbor Furthest)(min),(classes) (elongatedneighbor Nearest,max,minyx,yx,jijiyxjiyxjiddℵ∈ℵ∈ℵ∈ℵ∈=ℵℵ−=ℵℵ−Criteria FunctionPR , ANN, & ML27•Starting from every sample in a cluster•Merge them according to some criteria function•Until two clusters existPR , ANN, & ML28PR , ANN, & ML29PR , ANN, & ML30In Reality With n objects, the distance matrix is n x n For large databases, it is computationally expensive to compute and store the matrix Solution: storing only k nearest clusters in the distance matrix: complexity is k x nPR , ANN, & ML31Divisive Clustering Less often used Have to be careful that criterion function usually decreases monotonically (the samples become purer each time) Natural grouping is the one where a large drop in impurity occursiterationimpurityPR , ANN, & ML32ISODATA Algorithm A tried-and-true clustering algorithm  Dynamically update # of clusters Can go both top-down (split) and bottom-up (merge) directionsPR , ANN, & ML33Notation:merged be can that clusters ofnumber maximummergingfor distance maximumsplittingfor spread maximumclusters ofnumber (desired) eapproximatclustera in samples ofnumber on thresholdmaxNDNTmsDσPR , ANN, & ML34Algorithm 1. Cluster the existing data into Nc clusters but eliminate any data and classes with fewer than T members, decrease Nc. Exit when classification of samples has not changed 2. If iteration odd and  Split clusters whose samples are sufficiently disjoint, increase Nc If any clusters have been split, go to


View Full Document

UCSB CS 290 - Unsupervised Clustering

Download Unsupervised Clustering
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Unsupervised Clustering and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Unsupervised Clustering 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?