ILLINOIS CS 446 - 083117.1_revised (5 pages)
Previewing pages 1, 2 of 5 page document View the full content.083117.1_revised
Previewing pages 1, 2 of actual document.
View the full content.View Full Document
083117.1_revised
0
0
40 views
- Pages:
- 5
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning
Unformatted text preview:
CS446 Machine Learning Fall 2017 Lecture 2 Generalization Cross validation Lecturer Sanmi Koyejo Scribe Hsiao Ching Chang Aug 31st 2017 Recap from Last Lecture Notations We defined an instance space X and an output label space Y X and Y are spaces where examples xi yi are in That is xi X and yi Y Output label spaces can differ from task to task Below is a list of common tasks and their label spaces Binary classification Y 1 1 or sometimes Y 0 1 Multi class classification Y 1 p where p denotes the number of classes Regression Y R Probability regression Y 0 1 Multi output regression Y Rp where we try to predict a p dimensional vector Dataset is defined as D where D x1 y1 x2 y2 xn yn We also defined a learner or hypothesis h X Y where h takes inputs from input space X and make predictions to output space Y Nearest Neighbor Classifier For nearest neighbor algorithm we need to define two more things A distance function which takes pairs of inputs and maps to a distance output We can denote this by X X R For example for two instances x1 x2 of n dimensions we can use Minkowski distance which is defined as n X 1 p xi1 xi2 p i 1 1 2 2 Generalization Cross validation where p denotes the order of the metric and xi1 denotes the i th element of instance x1 The metric is called Manhattan distance when p equals 1 and when p equals 2 it is called Euclidean distance Number of nearest neighbors k With this setting the nearest neighbor classifier h can be defined as h x sign 1 k X yi i Nk x D where Nk x D denotes the k nearest neighbors in dataset D The main inductive bias we see in nearest neighbor classifiers is called spatial smoothness That is nearby points have similar labels so when we see an unknown instance we guess that it belongs to the majority of its neighbors We also discussed the pros and cons of nearest neighbor classifiers Pros Learning is free since there is no learning or training step It works well in practice Cons Testing cost is high Let s consider the nearest neighbor classifier h we defined earlier along with Euclidean distance and X Rd Computation complexity will be O nkd and memory consumption will be O nd Note that we are estimating the complexities with naive implementation There exist tricks that make this classifier more memory efficient such as using k d trees or hashing It behaves badly in high dimensions Suppose we have instance space X Rd When d comes to 10 000 or even larger there will be some issues Such high dimension problem is actually common in applications For instance if we use bag of word vectors for spam classification d would be vocabulary size Other cases include brain imaging data or computer vision applications
View Full Document