Clustering and the k-means AlgorithmDavid M. BleiCOS424Princeton UniversitySeptember 5, 2007D. Blei Clustering 01 1 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering•Goal: Automatically segment data into groups of similar points•Question: When and why would we want to do this?•Useful for:•Automatically organizing data•Understanding hidden structure in some data•Representing high-dimensional data in a low-dimensional space•Examples:•Customers according to purchase histories•Genes according to expression profile•Search results according to topic•MySpace users according to interests•A museum catalog according to image similarityD. Blei Clustering 01 2 / 32Clustering
View Full Document