Clustering

Home> Academic Documents> Clustering

DOC PREVIEW

Clustering

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Clustering• Clustering is an unsupervised learning approach: there is no target value(class label) to be predicted, the goal is finding common patterns orgrouping similar documents.• Motivation– Grouping search results– Creating topic hierarchies– Focusing similarity search• Models/algorithms for clustering– Conceptual (model-based) vs. partitioning– Exclusive vs. overlapping– Deterministic vs. probabilistic– Hierarchical vs. flat– Incremental vs. batch learning• Evaluating clustering quality: subjective approaches, objective functions.• Major approaches– Hierarchical Agglomerative Clustering: partitioning, deterministic– K-means: flat, de terministic, partitioning or conceptual– Expectation Maximization (EM): flat, partitioning, probabilistic– Collaborative Filtering: clustering users using terms (preferences)11 Hierarchical Agglomerative Clustering• At each step merge the two closest (most similar) clusters.• Distance/similarity function between instances (e.g. cosine similarity,Euclidean distance).• Distance/similarity function between clusters (e.g. distance between cen-ters, minimal distance, average distance).• Criteria for stopping merging:– desired number of clusters;– distance between the closest clusters is above a threshold.• Algorithms:– Nearest neighbor (single-linkage) agglomerative clustering: clusterdistance = minimal distance between elements. Merging stops whendistance > threshold. In fact, this is an algorithm for generating aminimal spanning tree.– Farthest neighbor (complete-linkage) agglomerative clustering: clusterdistance = maximal distance betwe en elements. Merging stops whendistance > threshold. The algorithm computes the complete subgraphfor every cluster.• Visualization: dendrogram• Problems: greedy algorithm (local minimum), once created a subtreecannot be restructured.22 k-means• Iterative distance-based clustering.• Used by statisticians for decades.• Similarly to Cluster/2 uses k seeds (predefined k), but is based on adistance measure:1. Select k instances (cluster centers) from the sample (usually at ran-dom).2. Assign instances to clusters according to their distance to the clustercenters.3. Find new cluster centers and go to step 2 until the process converges(i.e. the same instances are assigned to each cluster in two consecutivepasses).• The clustering depends greatly on the initial choice of cluster centers –the algorithm may fall in a local minimum.• Example of bad chioce of cluster centers: four instances at the vertices ofa rectangle, two initial cluster centers – midpoints of the long sides of therectangle. This is a stable configuration, however not a good clustering.• Solution to the local minimum problem: restart the algorithm with an-other set of cluster centers.• Hierarchical k-means: apply k = 2 recursively to the resulting clusters.33 Probabilty-based clusteringWhy probabilities?• Restricted amount of evidence implies probabilistic reasoning.• From a probabilistic perspective, we want to find the most likely clustersgiven the data.• An instance only has certain probability of belonging to a particularcluster.44 Probabilty-based clustering – mixture models• For a single attribute: three parameters - mean, standard deviation andsampling probability.• Each cluster A is defined by a mean (µA) and a standard deviation (σA).• Samples are taken from each cluster A with a specified probability ofsampling P (A).• Finite mixture problem: given a dataset, find the mean, standard devia-tion and the probability of sampling for each cluster.• If we know the classification of each ins tance, then:– mean (average), µ =1nPn1xi;– standard deviation, σ2=1n−1Pn1(xi− µ)2;– probability of sampling for class A, P (A) = proportion of instancesin it.• If we know the three parameters, the probability that an instance xbelongs to cluster A is:P (A|x) =P (x|A)P (A)P (x),where P (x|A) is the density function for A, f(x; µA, σA) =1√2πσAe−(x−µA)22σ2A.P (x) is not necessary as we calculate the numerators for all clusters andnormalize them by dividing by their sum.⇒ In fact, this is exactly the Naive Bayes approach.• For more attributes: naive Bayes assumption – independence betweenattributes. The joint probabilities of an instance are calculated as aproduct of the probabilities of all attributes.55 EM (expectation maximization)• Similarly to k-means, first se lect the cluster parameters (µA, σAandP (A)) or guess the classes of the instances, then iterate.• Adjustment needed: we know cluster probabilities, not actual clusters foreach instance. So, we use these probabilities as weights.• For cluster A:µA=Pn1wixiPn1wi, where wiis the probability that xibelongs to cluster A;σ2A=Pn1wi(xi−µ)2Pn1wi.• When to stop iteration - maximizing overall likelihood that the data comeform the dataset with the given parameters (”goodness” of clustering):Log-likelihood =Pilog(PAP (A)P (xi|A) )Stop when the difference between two successive iteration becomes neg-ligible (i.e. there is no improvement of clustering quality).66 Evaluating quality of clustering• Distance (similarity) based functions– Sum of squared errorJ =XAXx∈A||x − µA||2– Optimal clustering minilizes J: minimal variance clustering.• Probability (entropy) based functions– Probability of instance P (xi) =PAP (A)P (xi|A)– Probability of sample x1, ..., xn:Πni(XAP (A)P (xi|A) )– Log-likelihood:nXilog(XAP (A)P (xi|A) )• Evaluate clusters with respect to classes using preclassified instances(known classes)– Error: proportion of instances with different class and cluster labels.– Precision, Recall (niinstances in class i, njinstances in cluster j, nijmemb ers of class i in cluster j):R(i, j) =nijni, P (i, j) =nijnj– F–measureF (i, j) =2R(i, j)P (i, j)R(i, j) + P (i, j), F =XininmaxjF (i, j)77 Collaborative Filtering (Recommender Systems)Matrix representation• Persons (rows)• Items (columns)• M(i, j) = 1 if person i likes item j; 0 otherwise.Task: predicting missing values in rowsClustering approach• Cluster persons using items as features (e.g. k-means)• Use the values for the items in each cluster (e.g. centroids)EM-like approach (symmetric w.r.t. persons and items)1. Assign random cluster labels to persons and items2. Take a person and an item at random:• Compute the probabilty that the person belongs to the person clusters• Compute


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

Please select your school