**Unformatted text preview:**

Nearest Neighbor Methods Nearest Neighbor MethodsNearest Neighbor MethodsNearest Neighbor MethodsNearest Neighbor MethodsNearest Neighbor MethodsNearest Neighbor MethodsNearest Neighbor Methods1-NN Example20-NN ExampleNearest Neighbor MethodsPractical ChallengesChoosing the DistanceNormalizationIrrelevant AttributesIrrelevant AttributesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesK-Dimensional TreesNearest Neighbor MethodsNicholas RuozziUniversity of Texas at DallasBased on the slides of Vibhav Gogate and David SontagNearest Neighbor Methods• Learning• Store all training examples• Classifying a new point 𝑥𝑥𝑥• Find the training example (𝑥𝑥𝑖𝑖, 𝑦𝑦(𝑖𝑖)) such that 𝑥𝑥(𝑖𝑖)is closest (for some notion of close) to 𝑥𝑥𝑥• Classify 𝑥𝑥𝑥with the label 𝑦𝑦(𝑖𝑖)2Nearest Neighbor Methods3+++++++−−−−−−−++Nearest Neighbor Methods4+++++++−−−−−−−++Nearest Neighbor Methods5+++++++−−−−−−−++Nearest Neighbor Methods6+++++++−−−−−−−++Nearest Neighbor Methods7+++++++−−−−−−−++𝑘𝑘-nearest neighbor methods look at the 𝑘𝑘 closest points in the training set and take a majority vote (should choose 𝑘𝑘 to be odd)Nearest Neighbor Methods8+++++++−−−−−−−++𝑘𝑘-nearest neighbor methods look at the 𝑘𝑘 closest points in the training set and take a majority vote (should choose 𝑘𝑘 to be odd)1-NN Example9[Kevin Zakka]20-NN Example10[Kevin Zakka]Nearest Neighbor Methods• Applies to data sets with points in ℝ𝑑𝑑• Best for large data sets with only a few (< 20) attributes• Advantages• Learning is easy• Can learn complicated decision boundaries• Disadvantages• Classification is slow (need to keep the entire training set around)• Easily fooled by irrelevant attributes11Practical Challenges• How to choose the right measure of closeness?• Euclidean distance is popular, but many other possibilities• How to pick 𝑘𝑘?• Too small and the estimates are noisy, too large and the accuracy suffers• What if the nearest neighbor is really far away?12Choosing the Distance• Euclidean distance makes sense when each of the features is roughly on the same scale• If the features are very different (e.g., height and age), then Euclidean distance makes less sense as height would be less significant than age simply because age has a larger range of possible values• To correct for this, feature vectors are often recenteredaround their means and scaled by the standard deviation over the training set13Normalization• Sample mean𝑥𝑥 =1𝑛𝑛�𝑖𝑖=1𝑛𝑛𝑥𝑥𝑖𝑖• Sample variance�𝜎𝜎𝑘𝑘2=1𝑛𝑛−1�𝑖𝑖=1𝑛𝑛𝑥𝑥𝑘𝑘𝑖𝑖− 𝑥𝑥𝑘𝑘214Irrelevant AttributesConsider the nearest neighbor problem in one dimension15Irrelevant AttributesNow, add a new attribute that is just random noise...16K-Dimensional Trees• In order to do classification, we can compute the distances between all points in the training set and the point we are trying to classify• With 𝑚𝑚data points in n-dimensional space, this takes 𝑂𝑂(𝑚𝑚𝑛𝑛)time for Euclidean distance• It is possible to do better if we do some preprocessing on the training data17K-Dimensional Trees• k-d trees provide a data structure that can help simplify the classification task by constructing a tree that partitions the search space• Starting with the entire training set, choose some dimension, 𝑖𝑖• Select an element of the training data whose 𝑖𝑖𝑡𝑡𝑡dimension has the median value among all elements of the training set• Divide the training set into two pieces: depending on whether their 𝑖𝑖𝑡𝑡𝑡attribute is smaller or larger than the median• Repeat this partitioning process on each of the two new pieces separately18K-Dimensional Trees19[Images from slides by Mehyrar Mohri]K-Dimensional Trees• Start at the top of the k-d tree and traverse it to a leaf of the tree based on where the point to classify should fall• Once a leaf node is reached, it is selected to be the current closest point to 𝑥𝑥𝑥• Follow the path, in the opposite direction, from the leaf to the root• If the current node along the path is closer to 𝑥𝑥𝑥than the selected closest point it becomes the new closest point• Before moving up the tree, the algorithm checks if there could be any points in the opposite partition that are closer to 𝑥𝑥𝑥than the current closest point• If so, then closest point in that subtree is computed recursively• Otherwise, the parent of the current node along the path becomes the new current node20K-Dimensional Trees21K-Dimensional Trees22K-Dimensional Trees23K-Dimensional Trees24K-Dimensional Trees25K-Dimensional Trees26K-Dimensional Trees27K-Dimensional Trees28K-Dimensional Trees• By design, the constructed k-d tree is “bushy”• The idea is that if new points to classify are evenly distributed throughout the space, then the expected (amortized) cost of classification is approximately 𝑂𝑂(𝑑𝑑log 𝑀𝑀) operations for 𝑀𝑀data points in 𝑑𝑑 dimensional space• Summary• k-NN is fast and easy to implement• No training required• Can be good in practice (where

View Full Document