Unformatted text preview:

Introduction to Machine Learning Xiaojin Zhu [email protected] Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi-Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/S00196ED1V01Y200906AIM006 Morgan & Claypool Publishers, 2009. (download from UW computers)Outline • Representing “things” – Feature vector – Training sample • Unsupervised learning – Clustering • Supervised learning – Classification – RegressionLittle green men • The weight and height of 100 little green men • What can you learn from this data?A less alien example • From Iain Murray http://homepages.inf.ed.ac.uk/imurray2/Representing “things” in machine learning • An instance x represents a specific object (“thing”) • x often represented by a D-dimensional feature vector x = (x1, . . . , xD) ∈ RD • Each dimension is called a feature. Continuous or discrete. • x is a dot in the D-dimensional feature space • Abstraction of object. Ignores any other aspects (two men having the same weight, height will be identical)Feature representation example • Text document – Vocabulary of size D (~100,000): “aardvark … zulu” • “bag of word”: counts of each vocabulary entry – To marry my true love  (3531:1 13788:1 19676:1) – I wish that I find my soulmate this year  (3819:1 13448:1 19450:1 20514:1) • Often remove stopwords: the, of, at, in, … • Special “out-of-vocabulary” (OOV) entry catches all unknown wordsMore feature representations • Image – Color histogram • Software – Execution profile: the number of times each line is executed • Bank account – Credit rating, balance, #deposits in last day, week, month, year, #withdrawals … • You and me – Medical test1, test2, test3, …Training sample • A training sample is a collection of instances x1, . . . , xn, which is the input to the learning process. • xi = (xi1, . . . , xiD) • Assume these instances are sampled independently from an unknown (population) distribution P(x) • We denote this by xi ∼ P(x), where i.i.d. stands for independent and identically distributed. i.i.d.Training sample • A training sample is the “experience” given to a learning algorithm • What the algorithm can learn from it varies • We introduce two basic learning paradigms: – unsupervised learning – supervised learningUNSUPERVISED LEARNING No teacher.Unsupervised learning • Training sample x1, . . . , xn, that’s it • No teacher providing supervision as to how individual instances should be handled • Common tasks: – clustering, separate the n instances into groups – novelty detection, find instances that are very different from the rest – dimensionality reduction, represent each instance with a lower dimensional feature vector while maintaining key characteristics of the training samplesClustering • Group training sample into k clusters • How many clusters do you see? • Many clustering algorithms – HAC – k-means – …Example 1: music island • Organizing and visualizing music collection CoMIRVA http://www.cp.jku.at/comirva/Example 2: Google NewsExample 3: your digital photo collection • You probably have >1000 digital photos, ‘neatly’ stored in various folders… • After this class you’ll be about to organize them better – Simplest idea: cluster them using image creation time (EXIF tag) – More complicated: extract image featuresTwo most frequently used methods • Many clustering algorithms. We’ll look at the two most frequently used ones: – Hierarchical clustering Where we build a binary tree over the dataset – K-means clustering Where we specify the desired number of clusters, and use an iterative algorithm to find themHierarchical clustering • Very popular clustering algorithm • Input: – A dataset x1, …, xn, each point is a numerical feature vector – Does NOT need the number of clustersHierarchical Agglomerative Clustering • Euclidean (L2) distanceHierarchical clustering • Initially every point is in its own clusterHierarchical clustering • Find the pair of clusters that are the closestHierarchical clustering • Merge the two into a single clusterHierarchical clustering • Repeat…Hierarchical clustering • Repeat…Hierarchical clustering • Repeat…until the whole dataset is one giant cluster • You get a binary tree (not shown here)Hierarchical clustering • How do you measure the closeness between two clusters?Hierarchical clustering • How do you measure the closeness between two clusters? At least three ways: – Single-linkage: the shortest distance from any member of one cluster to any member of the other cluster. Formula? – Complete-linkage: the greatest distance from any member of one cluster to any member of the other cluster – Average-linkage: you guess it!Hierarchical clustering • The binary tree you get is often called a dendrogram, or taxonomy, or a hierarchy of data points • The tree can be cut at various levels to produce different numbers of clusters: if you want k clusters, just cut the (k-1) longest links • Sometimes the hierarchy itself is more interesting than the clusters • However there is not much theoretical justification to it…Advance topics • Constrained clustering: What if an expert looks at the data, and tells you – “I think x1 and x2 must be in the same cluster” (must-links) – “I think x3 and x4 cannot be in the same cluster” (cannot-links) x1 x2 x3 x4Advance topics • This is clustering with supervised information (must-links and cannot-links). We can • Change the clustering algorithm to fit constraints • Or , learn a better distance measure • See the book Constrained Clustering: Advances in Algorithms, Theory, and Applications Editors: Sugato Basu, Ian Davidson, and Kiri Wagstaff http://www.wkiri.com/conscluster/ x1 x2 x3 x4K-means clustering • Very popular clustering method • Don’t confuse it with the k-NN classifier • Input: – A dataset x1, …, xn, each point is a numerical feature vector – Assume the number of clusters, k, is givenK-means clustering • The dataset. Input k=5K-means clustering • Randomly picking 5 positions as initial cluster centers (not necessarily a data point)K-means clustering • Each point finds which cluster center it is closest to (very much like 1NN). The point belongs to that cluster.K-means clustering • Each cluster


View Full Document

UW-Madison COMPSCI 540 - CS 540 Lecture Notes

Download CS 540 Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 540 Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 540 Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?