New version page

Multiple Non-Redundant Spectral Clustering Views

Upgrade to remove ads

This preview shows page 1-2-3 out of 8 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Multiple Non-Redundant Spectral Clustering ViewsDonglin Niu [email protected] Department, Northeastern University, Boston, MA 02115Jennifer G. Dy [email protected] Department, Northeastern University, Boston, MA 02115MichaelI.Jordan [email protected] and Statistics Departments, University of California, Berkeley, CA 94720AbstractMany clustering algorithms only find oneclustering solution. However, data can of-ten be grouped and interpreted in many dif-ferent ways. This is particularly true inthe high-dimensional setting where differ-ent subspaces reveal different possible group-ings of the data. Instead of committingto one clustering solution, here we intro-duce a novel method that can provide sev-eral non-redundant clustering solutions tothe user. Our approach simultaneously learnsnon-redundant subspaces that provide multi-ple views and finds a clustering solution ineach view. We achieve this by augmentinga spectral clustering objective function to in-corporate dimensionality reduction and mul-tiple views and to penalize for redundancybetween the views.1. IntroductionClustering is often a first step in the analysis of com-plex multivariate data, particularly when a data ana-lyst wishes to engage in a preliminary exploration ofthe data. Most clustering algorithms find one parti-tioning of the data (Jain et al., 1999), but this is overlyrigid. In the exploratory data analysis setting, theremay be several views of the data that are of poten-tial interest. For example, given patient informationdata, what is interesting to physicians will be differentfrom what insurance companies find interesting. Thismulti-faceted nature of data is particularly prominentin the high-dimensional setting, where data such astext, images and genotypes may be grouped togetherApp earing in Proceedings of the 26thInternational Confer-ence on Machine Learning, Haifa, Israel, 2010. Copyright2010 by the author(s)/owner(s).in several different ways for different purposes. For ex-ample, images of faces of people can be grouped basedon their pose or identity. Web pages collected fromuniversities can be clustered based on the type of web-page’s owner, {faculty, student, staff}, field, {physics,math, engineering, computer science}, or identity ofthe university. In some cases, a data analyst wishesto find a single clustering, but this may require an al-gorithm to consider multiple clusterings and discardthose that are not of interest. In other cases, one maywish to summarize and organize the data according tomultiple possible clustering views. In either case, it isimportant to find multiple clustering solutions whichare non-redundant.Although the literature on clustering is enormous,there has been relatively little attention paid to theproblem of finding multiple non-redundant clusterings.Given a single clustering solution, Bae & Bailey (2006)impose cannot-link constraints on data points belong-ing to the same group in that clustering and thenuse agglomerative clustering in order to find an al-ternative clustering. Gondek & Hofmann (2004)usea conditional information bottleneck approach to findan alternative clustering to a particular clustering.Qi & Davidson (2009) propose an approach based onGaussian mixture models in which they minimize theKL-divergence between the projection of the originalempirical distribution of the data and the projectionsubject to the constraint that the sum-of-squared errorbetween samples in the projected space and the meansof the clusters they do not belong to is smaller thana pre-specified threshold. All of these methods finda single alternative view given one clustering solutionor a known grouping. In contrast, the approach thatwe present here can discover multiple (i.e., more thantwo) views.Recently, Caruana et al. (2006), Cui et al. (2007)andJain et al. (2008) also recognized the need to findMultiple Non-Redundant Spectral Clustering Viewsmultiple clustering solutions from data. The meta-clustering method in Caruana et al. (2006) generatesa diverse set of clustering solutions by either randominitialization or random feature weighting. Then toavoid presenting the user with too many clusterings,these solutions are combined using agglomerative clus-tering based on a Rand index for measuring similar-ity between pairwise clustering solutions. Our ap-proach differs from meta-clustering in that we directlyseek out multiple solutions by optimizing a multiplenon-redundant clustering criterion rather than relyingon random initialization or random feature weighting.Cui et al. (2007) propose a sequential method thatstarts by finding a dominant clustering partition, andthen finds alternative views by clustering in the sub-space orthogonal to the clustering solutions found inprevious iterations. Jain et al. (2008)proposeanon-sequential method that learns two disparate cluster-ings simultaneously by minimizing a K-means sum-squared error objective for the two clustering solutionswhile at the same time minimizing the correlation be-tween these two clusterings. Both of these methodsare based on K-means and are thus limited to con-vex clusters. In contrast, the approach we introducehere can discover non-convex shaped clusters in eachview; we view this capability as important in the ex-ploratory data analysis setting. Moreover, the methodin Jain et al. (2008) uses all the features in all views.Our approach is based on the intuition that differentviews most likely exist in different subspaces and thuswe learn multiple subspaces in conjunction with learn-ing the multiple alternative clustering solutions.In summary, this work that we present here advancesthe field in the following way: (1) we study an impor-tant multiple clustering discovery paradigm; (2) withinthis paradigm, we develop a novel approach that canfind clusters with arbitrary shapes in each view; (3)within each view, our method can learn the subspacein which the clusters reside; and finally, (4) we simul-taneously learn the multiple subspaces and the clus-terings in each view by optimizing a single objectivefunction.2. FormulationOur goal is to find multiple clustering views. Given ndata samples, there are cnpossible c disjoint partition-ings of the data (counting permutations of the samegroupings). Only a small number of these groupingsare likely to be meaningful. We would like the clustersin each view to be of good quality and we also wishfor the clustering solutions in the different views


Download Multiple Non-Redundant Spectral Clustering Views
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple Non-Redundant Spectral Clustering Views and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Non-Redundant Spectral Clustering Views 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?