# ILLINOIS CS 446 - 082917.2 (7 pages)

Previewing pages*1, 2*of 7 page document

**View the full content.**## 082917.2

Previewing pages
*1, 2*
of
actual document.

**View the full content.**View Full Document

## 082917.2

0 0 52 views

- Pages:
- 7
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview:**

CS446 Machine Learning Fall 2017 Lecture 1 Intro to ML Nearest Neighbor Classifiers Lecturer Sanmi Koyejo Scribe Davis Freimanis Aug 29th 2017 Machine Learning Introduction Basics Key Ideas Machine learning is everywhere And in the beginning of the lecture a few machine learning algorithms were written down from the audience1 presented in Figure 1 Figure 1 Algorithms The way Sanmi organized the given algorithms was by putting unsupervised generative algorithms on the bottom supervised on the top probabilistic on the left and nonprobabilistic on the right The given algorithms doesn t cover all of machine learning but it covers quite 1 Gradient decent was also mentioned but is not an algorithm by itself It is a way to optimize algorithms This method will be covered later in this class 1 2 1 Intro to ML Nearest Neighbor Classifiers a bit This is only a broad categorization of algorithms and there are algorithms in between of these categories Probabilistic algorithms are built around learning some kind of probability distribution as a way to do machine learning while nonprobabilsitic algorithms don t involve learning a distribution directly We will cover probabilistic and nonprobabilistic algorithms in this class quite a bit In supervised algorithms data is presented as D xi yi where xi is the input and yi is its label In unsupervised algorithms data doesn t include labels and is presented as D xi In supervised algorithms the goal is to learn a mapping from inputs x to y given the data set and the goal in unsupervised is to find some kind of structure in the x s There are also semi supervised algorithms where there is a mix of labeled and unlabeled data Reinforcement learning was also mentioned but will not be covered in this class unless we have time in the end There are other classes covering that e g the AI course CS440 Learning In this section we will cover the following learning functions constant memorization and k nearest neighbors We want to build a learner for supervised learning and are given the data D x1 y1 xn yn where each of the items are input output pairs The goal is to learn some function h such that h xn 1 gives an accurate prediction y n 1 Constant function One of the simplest learners to this problem would be a constant function h x c where c is a constant and yi is binary and can hold the value 1 The constant could be chosen as the average or the most popular value in the data set One good property of this function is that it has low variance which means that if it is trained on new data sets then it will give a fairly constant answer but it does not adapt to whatever we try to solve for that reason Memorization A smarter function is memorization and can be represented as follows h x y x if you have seen x before if x is new 1 Intro to ML Nearest Neighbor Classifiers 3 and what it does is that given a set of input points the algorithm looks in a database and checks weather or not it has seen the point before and the assumption is that if it has seen the point before then it makes sense that it would have the same label There are a few pros and cons for memorization One of the main pros is that it requires no training but the big con is that it cannot predict a label for new data no generalization Testing is expensive and it gets slower as the data set grows It also requires high storage to be able to store and look up the data Is it enough to be 100 confident that it is accurate We will talk about this in the next couple of lectures in more detail but the short answer is that it is not enough because you can have the same x with different labels Nearest Neighbor Classifier For the nearest neighbor classifier NN a demo was presented using Scikit learn Pedregosa et al 2011 and Jupyter notebook The demo was a modification of the Nearest Neighbors Classification example on Scikit learn The data set used in the example was the Iris data set Wikipedia 2017 that consists of 3 different species of Iris with 4 features the length of the sepal and petal but for the sake of simplicity and visualization we looked only at 2 of the features In the first example shown in Figure 2 we have a 3 class classification with k 1 where k is the number of nearest neighbors checked So for every prediction point we look at we give it the label of the nearest point in the training set and make a prediction based on that single point The contours in Figure 2 is from making a prediction on all points across the 2D grid Figure 2 3 classes k 1 For the second example 2 classes were merged to create a 2 class classification or binary classification with k 1 k 10 and k 100 This is why the algorithm sometimes is called k nearest neighbors kNN In Figure 3a we can see the classification boundary When 4 1 Intro to ML Nearest Neighbor Classifiers we increases to k 100 as seen in Figure 3b we get an interesting result due to the fact that k is larger than the number of samples in the training set In this case there are more red points than blue points and therefore there will be a majority of red nearest neighbors across the 2D grid a 2 classes k 10 b 2 classes k 100 c Accuracy graph for k 1 to 99 Figure 3 2 separated classes In Figure 3c we see what happens when we go from 1 to 99 nearest neighbors in the previous example The reason that the accuracy drops when k grows is because we end up predicting the majority class every time which results in bad predictions If we instead merge the points as shown in Figure 4a we get a graph shown in Figure 4b In the beginning we have a very noisy labeling and we make a lot of mistakes but as we increase the neighbors we make better predictions This proves that clear separation of points is not required in a kNN classification but it is bias towards separated data sets a 2 classes k 10 b Accuracy graph for k 1 to 99 Figure 4 2 scrambled classes The performance of kNN depends on parameters that we set instead of learn and they are called hyperparameters For every algorithm that we will look at in this class we will see that the performance depends on the hyperparameters For kNN one of the hyperparameters is the distance formula as the performance highly depends on which formula we choose 1 Intro to ML Nearest Neighbor Classifiers 5 More formally we can define kNN as 1 h x sign k X y z z Nk x D where Nk x D is the set of the k nearest points and h x …

View Full Document