UCSB CS 290 - Unsupervised Clustering - D2703624

Home> Schools> University of California, Santa Barbara> (CS) > CS 290> Unsupervised Clustering

DOC PREVIEW

UCSB CS 290 - Unsupervised Clustering

School name University of California, Santa Barbara

Course Cs 290- Big Data and Networks

Pages 39

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Unsupervised ClusteringUnsupervised ClusteringPR , ANN, & ML2Unsupervised Clustering Training samples are not labeled May not know how many classes a prior probability  state-conditional probability Automatic discovery of structures  Intuitively, objects in the same class stick together and form clustersPi()ϖpxi(|)ϖPR , ANN, & ML3 Locating groups (clusters) having similar measurementsGivenxxxunlabelledpartition in to C clustersnCℵ=ℵ ℵ ℵ{,,...,}(), ,...,1 212Unsupervised Clustering (cont)PR , ANN, & ML4Similarity Measure Need a similarity measurement s(x,x’) (e.g., distance between x and x’ )ddddq| dtqqkkdkyxyxyxyxyxyxΣyxyxyxyx⋅=⋅=−−=≥−=−=∑),(features)binary (for y Commonalit),(productinner Normalized)()(),(Distance sMahalanobi1,)|(),(Metric Minkowski111PR , ANN, & ML5Similarity Measure (cont.) How to set the threshold?  Too large all samples assigned into a single class Too small each sample in its own class How to properly weight each feature? How do brightness of a fish and length of a fish relate? How to scale (or normalize) different measures?PR , ANN, & ML6Axis ScalingPR , ANN, & ML7Axis Scaling (cont.)PR , ANN, & ML8PR , ANN, & ML9ThresholdPR , ANN, & ML10Criteria for Clustering A criterion function for clustering∑∑∑ℵ∈= ℵ∈=−=iixiici xinJ xmmx1||12∑ ∑ ∑= ℵ∈ ℵ∉−=ci x xii inJ1 '2|'|1xxPR , ANN, & ML11 Within and Between group variance minimize and maximize the trace and determinant of the appropriate scatter matrix trace: square of the scattering radius determinant: square of the scattering volume∑∑∑∑∑ℵ∈=ℵ∈= ℵ∈=−−==−−=xcitiBxiici xtWnnniixmmmmmSxmmxmxSiiii1))((1))((11Criteria for Clustering (cont.)PR , ANN, & ML12Pathology: when class sizes differ∑∑∑ℵ∈= ℵ∈=−=iixiici xinJ xmmx1||12 Relaxed constraint Allow the large class to grow into smaller class Stringent constraint Disallow large class and split itPR , ANN, & ML13K-Means Algorithm(fixed # of clusters) Arbitrarily pick N cluster centers, assign samples to nearest center Compute sample mean of each cluster Reassign samples to clusters with the nearest mean (for all samples) Repeat if there are changes, otherwise stopPR , ANN, & ML14PR , ANN, & ML15PR , ANN, & ML16PR , ANN, & ML17K-Means In some sense, it is a simplified version of the case II of learning parametric form ∑==cjjjjkiiikkiwPwxpwPwxpxwP1)(),|()(),|(),|())))))θθθ In stead of multiple membership, give the sample to whichever class whose mean is closest to it =otherwisexwPki0mean classclosest with class1),|(θ))PR , ANN, & ML18Fuzzy K-Means In Matlab, you can find k-mean (it is called c-mean) under fuzzy logic toolbox It implements fuzzy k-mean  Where membership function is not 1 or 0, but can be fractionalPR , ANN, & ML19Iterative K-means Iterative refinement of clusters to minimize Each step: move a sample from one to another∑∑ ∑ℵ∈= ℵ∈=−=iixiici xinJ xmmx1||122222ji|ˆ|1|ˆ|1|ˆ|11ˆ|ˆ|11ˆ to from xˆjjjiiijjjjjjjjjiiiiiiiiinnnnnnJJnnnJJnmxmxmxmxmmmxmxmm−+>−−−++=+−+=∗−−−=−−−=∗ℵℵ∗∗PR , ANN, & ML201. Select an initial partition of the n samples into clusters and compute 2. Select the next candidate sample3. If the current cluster is larger than 1, then4. Transfer 5. Update 6. If J has not changed in n attempts, stop, otherwise go back to 2. $x≠−+=−−=ijnnijnnjjjjjjj22|ˆ|1|ˆ|1mxmxρ$xtoifforalljk k jℵ≤ρρJmmi k,,Jmmc,,...,1PR , ANN, & ML21Hierarchical Clustering K-means assume a “flat” data description Hierarchical descriptions are more frequentlyPR , ANN, & ML22Hierarchical clustering (cont.) Samples in the same cluster will remain so at a higher levelx1x2x3x4x5x612345levelPR , ANN, & ML23 Bottom-up: agglomerate Top-down: divisive.to.Gobycdecrement,anddelete,and.Mergeandclusters,distinctofpairnearestthe.Find stopc,c.If },...,x{xand n c.Letjjijin251ˆ43ˆ2ˆ11ℵℵℵℵℵ≤=ℵ=Hierarchical clustering OptionsPR , ANN, & ML24 At a certain step, we have c classes∑ℵ∈==ℵixiiiinn xm1,||Merge two clusters i and j)(11,||jjiijixjiijjiijjiijnnnnxnnmnnijmm ++=+=+=ℵℵ∪ℵ=ℵ∑ℵ∈Hierarchical clustering ProcedurePR , ANN, & ML25 Then certain criteria function will increase 2222222||)(||||||jijijiijjijjiixjxixijijnnnnnnnnEjiijmmmmmmxmxmx−+=+−+=−−−−−=∆∑∑∑ℵ∈ℵ∈ℵ∈),(minarg||minarg,,2,**jijijijijijidnnnnji ℵℵ=−+>=< mmI*,j* chosen in such a way Until no such pair exist that cause an increase in the criteria function less than certain preset thresholdHierarchical clustering Procedure (cont.)PR , ANN, & ML26 In fact, the criteria function can be chosen in many different ways, resulting in different clustering behaviors)(max),(classes)(compact neighbor Furthest)(min),(classes) (elongatedneighbor Nearest,max,minyx,yx,jijiyxjiyxjiddℵ∈ℵ∈ℵ∈ℵ∈=ℵℵ−=ℵℵ−Criteria FunctionPR , ANN, & ML27•Starting from every sample in a cluster•Merge them according to some criteria function•Until two clusters existPR , ANN, & ML28PR , ANN, & ML29PR , ANN, & ML30In Reality With n objects, the distance matrix is n x n For large databases, it is computationally expensive to compute and store the matrix Solution: storing only k nearest clusters in the distance matrix: complexity is k x nPR , ANN, & ML31Divisive Clustering Less often used Have to be careful that criterion function usually decreases monotonically (the samples become purer each time) Natural grouping is the one where a large drop in impurity occursiterationimpurityPR , ANN, & ML32ISODATA Algorithm A tried-and-true clustering algorithm  Dynamically update # of clusters Can go both top-down (split) and bottom-up (merge) directionsPR , ANN, & ML33Notation:merged be can that clusters ofnumber maximummergingfor distance maximumsplittingfor spread maximumclusters ofnumber (desired) eapproximatclustera in samples ofnumber on thresholdmaxNDNTmsDσPR , ANN, & ML34Algorithm 1. Cluster the existing data into Nc clusters but eliminate any data and classes with fewer than T members, decrease Nc. Exit when classification of samples has not changed 2. If iteration odd and  Split clusters whose samples are sufficiently disjoint, increase Nc If any clusters have been split, go to

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

UCSB CS 290 - Unsupervised Clustering

Sign up for free to view:

Please select your school