The Nearest-Neighbor and the k-Nearest-Neighbor algorithmsIn their simplest form, these algorithms do not perform any computation during training. Thecomputation is performed only when a test example is presented. Therefore, they are describedwith input that contains the training examples and one test example. The expensive step inthese algorithms is the computation of nearest-neighbors. Efficient implementations exist withsophisticated data structures for efficient computation of nearest-neighbors. These are not discussedhere.Input: m training examples, given as the pairs (xi, yi), where xiis an n-dimensional feature vectorand yiis its label. A test example x.Output: y, the computed label of x.The Nearest-Neighbor algorithma. Determine xinearest to x. It minimizes the distance to x according to a pre-defined norm.distance(xi, x) = |xi− x| (1)b. Return y = yi.The most commonly used norm in (1) is the Euclidean norm:|xi− x|2=nXj=1(xi(j) − x(j))2There are multiple approaches for handling the case in which there is more than one trainingexample nearest to x.The k-Nearest-Neighbor algorithma. Determine xi1, . . . , xik, the k training examples nearest to x according to a pre-defined norm.b. Let yi1, . . . , yikbe the labels of the k nearest neighbors. Choose y as the label that appearsmost frequently among yi1, . . . , yik.There are multiple approaches for handling the case in which no label has a clear majority in b.The value of kIn many practical problems k-NN with k > 1 performs better than the simple 1-NN. The mosteffective method of estimating a useful value of k is the technique of cross validation.ExampleTraining data:i xiyi1 (1,1) A2 (1,2) A3 (2,2) B4 (2,3) BTo classify the test example (3,2) according to k-NN we need to compute its distances to the 4examples. The square of the Euclidean distances is shown in the following table:i xiyi|xi− (3, 2)|21 (1,1) A 52 (1,2) A 43 (2,2) B 14 (2,3) B 2Using 1-NN the nearest example has the index i = 3, and the label is: y = y3= B.Using 3-NN the 3 nearest examples are with indexes i = 3, 4, 2. Two have label B and one labelA, so that the computed label is y = B.Bayesian justification of k-NNThe set of training examples determines a probability distribution. We denote the probabilitiesaccording to this distribution by P r. Given a test example x we attempt to determine its label,knowing that it is one of v1, v2, . . . , vr. the optimal classification of x is the vjmaximizing:P r(vjis the classification of x) ≡ P r(vj|x) =P r(x ∧ vj)P r(x)=number of training examples with attribute vector x and label vjnumber of training examples with attribute vector xThe problem with this formulation is that both numerator and denominator are almost always zero.We can relax it by assuming that the labeling of x does not change over a neighborhood around x.The smallest neighborhood that gives a nonzero denominator is the one that includes the nearestneighbor of x. Using larger neighborhoods result in the k-NN
View Full Document