DOC PREVIEW
MSU CSE 847 - clustering
Course Cse 847-
Pages 66

This preview shows page 1-2-3-4-31-32-33-34-35-63-64-65-66 out of 66 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 66 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Overview of ClusteringOutlineClusteringApplication (I): Search Result ClusteringApplication (II): NavigationApplication (III): Google NewsApplication (III): VisualizationApplication (IV): Image CompressionHow to Find good Clustering?How to Efficiently Clustering Data?K-means for ClusteringSlide 12Slide 13K-meansSlide 15Slide 16Slide 17Slide 18Improve K-meansImproved K-meansSlide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29A Gaussian Mixture Model for ClusteringLearning a Gaussian Mixture (with known covariance)Slide 32Slide 33Slide 34Gaussian Mixture Example: StartAfter First IterationAfter 2nd IterationAfter 3rd IterationAfter 4th IterationAfter 5th IterationAfter 6th IterationAfter 20th IterationMixture Model for Doc ClusteringSlide 44Slide 45Slide 46Learning a Mixture ModelSlide 48Examples of Mixture ModelsOther Mixture ModelsProblems (I)Problems (II)Graph Partition2-way Spectral Graph PartitioningSolving the Optimization ProblemRelaxation ApproachSlide 57Slide 58Graph LaplacianRecovering PartitionsSpectral ClusteringNormalized Cut (Shi & Malik, 1997)Normalized CutSlide 64Image SegmentationNon-negative Matrix FactorizationOverview of ClusteringRong JinOutlineK means for clusteringExpectation Maximization algorithm for clusteringSpectrum clustering (if time is permitted)Clustering$$$ageFind out the underlying structure for given data pointsApplication (I): Search Result ClusteringApplication (II): NavigationApplication (III): Google NewsApplication (III): VisualizationIslands of music (Pampalk et al., KDD’ 03)Application (IV): Image Compressionhttp://www.ece.neu.edu/groups/rpl/kmeans/How to Find good Clustering?Minimize the sum of distance within clustersC1C2C3C4C5{ }( ),62,1 1,arg minj i jni j i jj iC mm x C= =-��rrr,6,11 the j-th cluster0 the j-th cluster1any a single clusterii jii jjixmxmx=��=���=� ��rrrHow to Efficiently Clustering Data?{ }( ),62,1 1,arg minj i jni j i jj iC mm x C= =-��rrr{ } { },Memberships and centers are correlated.i j jm C{ },1,,1 Given memberships , ni j iii j jni jim xm Cm===��rr2,1 arg min( ) Given centers { }, 0 otherwisei jkj i jj x CC m�= -�=���rrrK-means for ClusteringK-meansStart with a random guess of cluster centersDetermine the membership of each data pointsAdjust the cluster centersK-means for ClusteringK-meansStart with a random guess of cluster centersDetermine the membership of each data pointsAdjust the cluster centersK-means for ClusteringK-meansStart with a random guess of cluster centersDetermine the membership of each data pointsAdjust the cluster centersK-means1. Ask user how many clusters they’d like. (e.g. k=5)K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locationsK-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to.4. Each Center finds the centroid of the points it ownsK-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to.4. Each Center finds the centroid of the points it ownsAny Computational Problem?Computational Complexity: O(N) where N is the number of points?Improve K-meansGroup points by regionKD treeSR treeKey differenceFind the closest center for each rectangleAssign all the points within a rectangle to one clusterImproved K-meansFind the closest center for each rectangleAssign all the points within a rectangle to one clusterImproved K-meansImproved K-meansImproved K-meansImproved K-meansImproved K-meansImproved K-meansImproved K-meansImproved K-meansImproved K-meansA Gaussian Mixture Model for ClusteringAssume that data are generated from a mixture of Gaussian distributionsFor each Gaussian distributionCenter: iVariance: i (ignore)For each data pointDetermine membership : if belongs to j-th clusterij iz xLearning a Gaussian Mixture(with known covariance)Probability ( )ip x x=( )2/ 2 22( ) ( , ) ( ) ( | )1( ) exp22j jji i j j i ji jjdp x x p x x p p x xxpm mmm m m m m mmm msps= = = = = = = =� �-� �= = -� �� �� ��Learning a Gaussian Mixture(with known covariance)Probability ( )ip x x=( )2/ 2 22( ) ( , ) ( ) ( | )1( ) exp22j jji i j j i ji jjdp x x p x x p p x xxpm mmm m m m m mmm msps= = = = = = = =� �-� �= = -� �� �� ��Log-likelihood of dataApply MLE to find optimal parameters ( )2/ 2 221log ( ) log ( ) exp22ji ji jdi ixp x x pmmm msps� �� �-� �� �= = = -� �� �� �� �� �� � �{ }( ),j jjp m m m=Learning a Gaussian Mixture(with known covariance)22221( )21( )21( )( )i ji nxjkxnne pe pmsmsm mm m- -- -====�[ ] ( | )ij j iE z p x xm m= = =E-Step1( | ) ( )( | ) ( )i j jki n jnp x x pp x x pm m m mm m m m== = === = =�Learning a Gaussian Mixture(with known covariance)111[ ][ ]mj ij imiijiE z xE zm==���M-Step11( ) [ ]mj ijip E zmm m== ��Gaussian Mixture Example: StartAfter First IterationAfter 2nd IterationAfter 3rd IterationAfter 4th IterationAfter 5th IterationAfter 6th IterationAfter 20th IterationMixture Model for Doc ClusteringA set of language models { }1 2, ,...,Kq q qQ =1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p wq q q q=Mixture Model for Doc ClusteringA set of language models { }1 2, ,...,Kq q qQ =1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p wq q q q=( )ip d d=( , )1( ) ( , )( ) ( | )( ) ( | )jjk iji i jj i jVtf w dj k jkp d d p d dp p d dp p wqqqq qq q q qq q q== = = == = = =� �= =� �����ProbabilityMixture Model for Doc ClusteringA set of language models { }1 2, ,...,Kq q qQ =1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p wq q q q=( )ip d d=( , )1( ) ( , )( ) ( | )( ) ( | )jjk iji i jj i jVtf w dj k jkp d d p d dp p d dp p wqqqq qq q q qq q q== = = == = = =� �= =� �����ProbabilityMixture Model for Doc ClusteringA set of language models { }1 2, ,...,Kq q qQ =1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p wq q q q=( )ip d d=( , )1( ) ( , )( ) ( | )( ) ( | )jjk iji i jj i jVtf


View Full Document

MSU CSE 847 - clustering

Course: Cse 847-
Pages: 66
Download clustering
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view clustering and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view clustering 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?