DOC PREVIEW
UIC CS 583 - Chapter 4: Unsupervised Learning

This preview shows page 1-2-3-4-5-6-44-45-46-47-48-49-50-90-91-92-93-94-95 out of 95 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 95 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Chapter 4: Unsupervised LearningRoad mapSupervised learning vs. unsupervised learningClusteringAn illustrationWhat is clustering for?What is clustering for? (cont…)Aspects of clusteringSlide 9K-means clusteringK-means algorithmK-means algorithm – (cont …)Stopping/convergence criterionAn exampleAn example (cont …)An example distance functionA disk version of k-meansA disk version of k-means (cont …)Strengths of k-meansWeaknesses of k-meansWeaknesses of k-means: Problems with outliersWeaknesses of k-means: To deal with outliersWeaknesses of k-means (cont …)Slide 24Slide 25K-means summarySlide 27Common ways to represent clustersUsing classification modelUse frequent values to represent clusterClusters of arbitrary shapesSlide 32Hierarchical ClusteringTypes of hierarchical clusteringAgglomerative clusteringAgglomerative clustering algorithmAn example: working of the algorithmMeasuring the distance of two clustersSingle link methodComplete link methodAverage link and centroid methodsThe complexitySlide 43Distance functionsDistance functions for numeric attributesEuclidean distance and Manhattan distanceSquared distance and Chebychev distanceDistance functions for binary and nominal attributesConfusion matrixSymmetric binary attributesSymmetric binary attributes: exampleAsymmetric binary attributesNominal attributesDistance function for text documentsSlide 55Data standardizationInterval-scaled attributesInterval-scaled attributes (cont …)Ratio-scaled attributesSlide 60Nominal attributes: an exampleOrdinal attributesSlide 63Mixed attributesConvert to a single typeConvert to a single type (cont …)Combining individual distancesSlide 68How to choose a clustering algorithmChoose a clustering algorithm (cont …)Slide 71Cluster Evaluation: hard problemCluster evaluation: ground truthEvaluation measures: EntropyEvaluation measures: puritySlide 76A remark about ground truth evaluationEvaluation based on internal informationIndirect evaluationSlide 80Holes in data spaceHoles are useful tooData regions and empty regionsSupervised learning for unsupervised learningSlide 85Can it done without adding N points?Characteristics of the approachBuilding the TreeSlide 89How many N points to add?Slide 91How many N points to add? (cont…)Building the decision treeSlide 94SummaryChapter 4: Unsupervised LearningCS583, Bing Liu, UIC2Road mapBasic conceptsK-means algorithmRepresentation of clustersHierarchical clusteringDistance functionsData standardizationHandling mixed attributesWhich clustering algorithm to use?Cluster evaluationDiscovering holes and data regionsSummaryCS583, Bing Liu, UIC3Supervised learning vs. unsupervised learningSupervised learning: discover patterns in the data that relate data attributes with a target (class) attribute. These patterns are then utilized to predict the values of the target attribute in future data instances. Unsupervised learning: The data have no target attribute. We want to explore the data to find some intrinsic structures in them.CS583, Bing Liu, UIC4ClusteringClustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near) each other in one cluster and data instances that are very different (far away) from each other into different clusters. Clustering is often called an unsupervised learning task as no class values denoting an a priori grouping of the data instances are given, which is the case in supervised learning. Due to historical reasons, clustering is often considered synonymous with unsupervised learning.In fact, association rule mining is also unsupervisedThis chapter focuses on clustering.CS583, Bing Liu, UIC5An illustrationThe data set has three natural groups of data points, i.e., 3 natural clusters.CS583, Bing Liu, UIC6What is clustering for? Let us see some real-life examplesExample 1: groups people of similar sizes together to make “small”, “medium” and “large” T-Shirts.Tailor-made for each person: too expensiveOne-size-fits-all: does not fit all. Example 2: In marketing, segment customers according to their similaritiesTo do targeted marketing.CS583, Bing Liu, UIC7What is clustering for? (cont…)Example 3: Given a collection of text documents, we want to organize them according to their content similarities,To produce a topic hierarchyIn fact, clustering is one of the most utilized data mining techniques. It has a long history, and used in almost every field, e.g., medicine, psychology, botany, sociology, biology, archeology, marketing, insurance, libraries, etc. In recent years, due to the rapid increase of online documents, text clustering becomes important.CS583, Bing Liu, UIC8Aspects of clusteringA clustering algorithmPartitional clusteringHierarchical clustering…A distance (similarity, or dissimilarity) functionClustering qualityInter-clusters distance  maximizedIntra-clusters distance  minimizedThe quality of a clustering result depends on the algorithm, the distance function, and the application.CS583, Bing Liu, UIC9Road mapBasic conceptsK-means algorithmRepresentation of clustersHierarchical clusteringDistance functionsData standardizationHandling mixed attributesWhich clustering algorithm to use?Cluster evaluationDiscovering holes and data regionsSummaryCS583, Bing Liu, UIC10K-means clusteringK-means is a partitional clustering algorithmLet the set of data points (or instances) D be {x1, x2, …, xn}, where xi = (xi1, xi2, …, xir) is a vector in a real-valued space X  Rr, and r is the number of attributes (dimensions) in the data. The k-means algorithm partitions the given data into k clusters. Each cluster has a cluster center, called centroid.k is specified by the userCS583, Bing Liu, UIC11K-means algorithmGiven k, the k-means algorithm works as follows: 1) Randomly choose k data points (seeds) to be the initial centroids, cluster centers2) Assign each data point to the closest centroid3) Re-compute the centroids using the current cluster memberships.4) If a convergence criterion is not met, go to 2).CS583, Bing Liu, UIC12K-means algorithm – (cont …)CS583, Bing Liu, UIC13Stopping/convergence criterion 1. no (or minimum) re-assignments of data points to different clusters, 2. no (or minimum) change of centroids, or 3. minimum decrease in the


View Full Document

UIC CS 583 - Chapter 4: Unsupervised Learning

Download Chapter 4: Unsupervised Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 4: Unsupervised Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 4: Unsupervised Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?