UT Dallas CS 6375 - 3.4Nearest-Neighbor - D3096060

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6375> 3.4Nearest-Neighbor

DOC PREVIEW

UT Dallas CS 6375 - 3.4Nearest-Neighbor

School name University of Texas at Dallas

Course Cs 6375- Machine Learning

Pages 2

This preview shows page 1 out of 2 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

The Nearest-Neighbor and the k-Nearest-Neighbor algorithmsIn their simplest form, these algorithms do not perform any computation during training. Thecomputation is performed only when a test example is presented. Therefore, they are describedwith input that contains the training examples and one test example. The expensive step inthese algorithms is the computation of nearest-neighbors. Efficient implementations exist withsophisticated data structures for efficient computation of nearest-neighbors. These are not discussedhere.Input: m training examples, given as the pairs (xi, yi), where xiis an n-dimensional feature vectorand yiis its label. A test example x.Output: y, the computed label of x.The Nearest-Neighbor algorithma. Determine xinearest to x. It minimizes the distance to x according to a pre-defined norm.distance(xi, x) = |xi− x| (1)b. Return y = yi.The most commonly used norm in (1) is the Euclidean norm:|xi− x|2=nXj=1(xi(j) − x(j))2There are multiple approaches for handling the case in which there is more than one trainingexample nearest to x.The k-Nearest-Neighbor algorithma. Determine xi1, . . . , xik, the k training examples nearest to x according to a pre-defined norm.b. Let yi1, . . . , yikbe the labels of the k nearest neighbors. Choose y as the label that appearsmost frequently among yi1, . . . , yik.There are multiple approaches for handling the case in which no label has a clear majority in b.The value of kIn many practical problems k-NN with k > 1 performs better than the simple 1-NN. The mosteffective method of estimating a useful value of k is the technique of cross validation.ExampleTraining data:i xiyi1 (1,1) A2 (1,2) A3 (2,2) B4 (2,3) BTo classify the test example (3,2) according to k-NN we need to compute its distances to the 4examples. The square of the Euclidean distances is shown in the following table:i xiyi|xi− (3, 2)|21 (1,1) A 52 (1,2) A 43 (2,2) B 14 (2,3) B 2Using 1-NN the nearest example has the index i = 3, and the label is: y = y3= B.Using 3-NN the 3 nearest examples are with indexes i = 3, 4, 2. Two have label B and one labelA, so that the computed label is y = B.Bayesian justification of k-NNThe set of training examples determines a probability distribution. We denote the probabilitiesaccording to this distribution by P r. Given a test example x we attempt to determine its label,knowing that it is one of v1, v2, . . . , vr. the optimal classification of x is the vjmaximizing:P r(vjis the classification of x) ≡ P r(vj|x) =P r(x ∧ vj)P r(x)=number of training examples with attribute vector x and label vjnumber of training examples with attribute vector xThe problem with this formulation is that both numerator and denominator are almost always zero.We can relax it by assuming that the labeling of x does not change over a neighborhood around x.The smallest neighborhood that gives a nonzero denominator is the one that includes the nearestneighbor of x. Using larger neighborhoods result in the k-NN

View Full Document

UT Dallas CS 6375 - 3.4Nearest-Neighbor

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 2 pages.

UT Dallas CS 6375 - 3.4Nearest-Neighbor

Sign up for free to view:

Please select your school