DOC PREVIEW
BU CS 565 - Clustering IV

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1OutlineGeneral form of impossibility resultsComputational task: clusteringAxiom 1: Scale invarianceAxiom 2: RichnessAxiom 3: ConsistencySlide 8ExamplesExamples (cont.)Centroid-based clustering and consistencyk-centroid clustering and the consistency axiomImpossibility theoremImpossibility theorem (proof sketch)What does an impossibility result really meanSlide 16Density-Based Clustering MethodsClassification of points in density-based clusteringCore, border and noise pointsDBSCAN: The AlgorithmTime and space complexity of DBSCANStrengths and weaknesses of DBSCANGeneric density-based clustering on a gridQuestionsClustering High-Dimensional DataThe Curse of DimensionalityWhy Subspace Clustering? (Parsons et al. SIGKDD Explorations 2004)CLIQUE (Clustering In QUEst)The CLIQUE algorithmCLIQUE: Monotonicity propertyStrengths and weakness of CLIQUESlide 32ClusteringCo-ClusteringMotivation: Sponsored SearchSlide 36Co-Clusters in Sponsored SearchCo-Clustering in Sponsored SearchCo-Clusters in Gene Expression DataClustering of the rowsClustering of the columnsCost of clusteringSlide 43Co-Clustering Objective FunctionSome BackgroundAlgorithmProperties of the algorithmAlgorithm--detailsClustering IVOutline•Impossibility theorem for clustering•Density-based clustering and subspace clustering•Bi-clustering or co-clusteringGeneral form of impossibility results•Define a set of simple axioms (properties) that a computational task should satisfy•Prove that there does not exist an algorithm that can simultaneously satisfy all the axioms  impossibilityComputational task: clustering•A clustering function operates on a set X of n points. X = {1,2,…,n}•Distance function d: X x X  R with d(i,j)≥0, d(i,j)=d(j,i), and d(i,j)=0 only if i=j•Clustering function f: f(X,d) = Γ, where Γ is a partition of XAxiom 1: Scale invariance•For a>0, distance function ad has values (ad)(i,j)=ad(i,j)•For any d and for any a>0 we have f(d) = f(ad)•The clustering function should not be sensitive to the changes in the units of distance measurement – should not have a built-in “length scale”Axiom 2: Richness•The range of f is equal to the set of partitions of X•For any X and any partition Γ of X, there is a distance function on X such that f(X,d) = Γ.Axiom 3: Consistency•Let Γ be a partition of X•d, d’ two distance functions on X•d’ is a Γ-transformation of d, if–For all i,jє X in the same cluster of Γ, we have d’(i,j)≤d(i,j)–For all i,jє X in different clusters of Γ, we have d’(i,j)≥d(i,j)•Consistency: if f(X,d)= Γ and d’ is a Γ-transformation of d, then f(X,d’)= Γ.Axiom 3: Consistency•Intuition: Shrinking distances between points inside a cluster and expanding distances between points in different clusters does not change the resultExamples•Single-link agglomerative clustering•Repeatedly merge clusters whose closest points are at minimum distance •Continue until a stopping criterion is met–k-cluster stopping criterion: continue until there are k clusters–distance-r stopping criterion: continue until all distances between clusters are larger than r–scale-a stopping criterion: let d* be the maximum pairwise distance; continue until all distances are larger than ad*Examples (cont.)•Single-link agglomerative clustering with k-cluster stopping criterion does not satisfy richness axiom•Single-link agglomerative clustering with distance-r stopping criterion does not satisfy scale-invariance property•Single-link agglomerative clustering with scale-a stopping criterion does not satisfy consistency propertyCentroid-based clustering and consistency•k-centroid clustering: –S subset of X for which ∑iєXminjєS{d(i,j)} is minimized–Partition of X is defined by assigning each element of X to the centroid that is the closest to it•Theorem: for every k≥2 and for n sufficiently large relative to k, the k-centroid clustering function does not satisfy the consistency propertyk-centroid clustering and the consistency axiom•Intuition of the proof•Let k=2 and X be partitioned into parts Y and Z•d(i,j) ≤ r for every i,j є Y•d(i,j) ≤ ε, with ε<r for every i,j є Z•d(i,j) > r for every i є Y and j є Z•Split part Y into subparts Y1 and Y2•Shrink distances in Y1 appropriately•What is the result of this shrinking?Impossibility theorem•For n≥2, there is no clustering function that satisfies all three axioms of scale-invariance, richness and consistencyImpossibility theorem (proof sketch)•A partition Γ’ is a refinement of partition Γ, if each cluster C’є Γ’ is included in some set Cє Γ•There is a partial order between partitions: Γ’≤ Γ•Antichain of partitions: a collection of partitions such that no one is a refinement of others•Theorem: If a clustering function f satisfies scale-invariance and consistency, then, the range of f is an anti-chainWhat does an impossibility result really mean•Suggests a technical underpinning for the difficulty in unifying the initial, informal concept of clustering•Highlights basic trade-offs that are inherent to the clustering problem•Distinguishes how clustering methods resolve these tradeoffs (by looking at the methods not only at an operational level)Outline•Impossibility theorem for clustering•Density-based clustering and subspace clustering •Bi-clustering or co-clusteringDensity-Based Clustering Methods•Clustering based on density (local cluster criterion), such as density-connected points•Major features:–Discover clusters of arbitrary shape–Handle noise–One scan–Need density parameters as termination condition•Several interesting studies:–DBSCAN: Ester, et al. (KDD’96)–OPTICS: Ankerst, et al (SIGMOD’99).–DENCLUE: Hinneburg & D. Keim (KDD’98)–CLIQUE: Agrawal, et al. (SIGMOD’98)Classification of points in density-based clustering•Core points: Interior points of a density-based cluster. A point p is a core point if for distance Eps :–|NEps(p)={q | dist(p,q) <=  }| ≥ MinPts•Border points: Not a core point but within the neighborhood of a core point (it can be in the neighborhoods of many core points)•Noise points: Not a core or a border pointCore, border and noise pointsEpsEps EpsDBSCAN: The Algorithm–Label all points as core, border, or noise points–Eliminate noise points –Put an edge between all core points that are within Eps of each other–Make each group of connected core


View Full Document

BU CS 565 - Clustering IV

Documents in this Course
Load more
Download Clustering IV
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Clustering IV and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Clustering IV 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?