Nearest Neighbor Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Nearest Neighbor Methods Learning Store all training examples Classifying a new point Find the training example such that is closest for some notion of close to Classify with the label 2 Nearest Neighbor Methods 3 Nearest Neighbor Methods 4 Nearest Neighbor Methods 5 Nearest Neighbor Methods 6 Nearest Neighbor Methods nearest neighbor methods look at the closest points in the training set and take a majority vote should choose to be odd 7 Nearest Neighbor Methods nearest neighbor methods look at the closest points in the training set and take a majority vote should choose to be odd 8 1 NN Example 9 Kevin Zakka 20 NN Example 10 Kevin Zakka Nearest Neighbor Methods Applies to data sets with points in Best for large data sets with only a few 20 attributes Advantages Learning is easy Disadvantages around Can learn complicated decision boundaries Classification is slow need to keep the entire training set Easily fooled by irrelevant attributes 11 Practical Challenges How to choose the right measure of closeness Euclidean distance is popular but many other possibilities How to pick Too small and the estimates are noisy too large and the accuracy suffers What if the nearest neighbor is really far away 12 Choosing the Distance Euclidean distance makes sense when each of the features is roughly on the same scale If the features are very different e g height and age then Euclidean distance makes less sense as height would be less significant than age simply because age has a larger range of possible values To correct for this feature vectors are often recentered around their means and scaled by the standard deviation over the training set 13 Normalization Sample mean Sample variance 1 1 2 1 1 1 2 14 Irrelevant Attributes Consider the nearest neighbor problem in one dimension 15 Irrelevant Attributes Now add a new attribute that is just random noise 16 K Dimensional Trees In order to do classification we can compute the distances between all points in the training set and the point we are trying to classify With data points in n dimensional space this takes time for Euclidean distance It is possible to do better if we do some preprocessing on the training data 17 K Dimensional Trees k d trees provide a data structure that can help simplify the classification task by constructing a tree that partitions the search space Starting with the entire training set choose some dimension Select an element of the training data whose dimension has the median value among all elements of the training set Divide the training set into two pieces depending on whether their attribute is smaller or larger than the median Repeat this partitioning process on each of the two new pieces separately 18 K Dimensional Trees Images from slides by Mehyrar Mohri 19 K Dimensional Trees Start at the top of the k d tree and traverse it to a leaf of the tree based on where the point to classify should fall Once a leaf node is reached it is selected to be the current closest point to Follow the path in the opposite direction from the leaf to the root If the current node along the path is closer to point it becomes the new closest point than the selected closest Before moving up the tree the algorithm checks if there could be any than the current points in the opposite partition that are closer to closest point If so then closest point in that subtree is computed recursively Otherwise the parent of the current node along the path becomes the new current node 20 K Dimensional Trees 21 K Dimensional Trees 22 K Dimensional Trees 23 K Dimensional Trees 24 K Dimensional Trees 25 K Dimensional Trees 26 K Dimensional Trees 27 K Dimensional Trees 28 K Dimensional Trees By design the constructed k d tree is bushy The idea is that if new points to classify are evenly distributed throughout the space then the expected amortized cost of classification is approximately data points in dimensional space operations for log Summary k NN is fast and easy to implement No training required Can be good in practice where applicable 29
View Full Document