DOC PREVIEW
BGSU STAT 402 - K-Nearest-Neighbor - 1 slide per page

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

K-Nearest-NeighborDr. İbrahim ÇaparAssistant ProfessorDATA MININGCharacteristicsData-driven, not model-drivenMakes no assumptions about the dataBasic IdeaFor a given record to be classified, identify nearby records“Near” means records with similar predictor values X1, X2, … XpClassify the record as whatever the predominant class is among the nearby records (the “neighbors”)How to measure “nearby”?The most popular distance measure is Euclidean distance− + − + ⋯ + − The data must be standardizedChoosing kK is the number of nearby neighbors to be used to classify the new recordK=1 means use the single nearest recordK=5 means use the 5 nearest recordsTypically choose that value of k which has lowest error rate in validation dataLow k vs. High kLow values of k (1, 3, …) capture local structure in data (but also noise)High values of k provide more smoothing, less noise, but may miss local structureThe extreme case of k = nUsing K-NN for ClassificationThe majority vote determines class. In case of tide randomly go with one or use the closest to data point to make determinationUsing K-NN for Prediction Instead of “majority vote determines class” use average of response valuesMay be a weighted average, weight decreasing with distanceAdvantagesSimpleNo assumptions required about Normal distribution, etc.Effective at capturing complex interactions among variables without having to define a statistical modelShortcomingsRequired size of training set increases exponentially with # of predictors, pThis is because expected distance to nearest neighbor increases with p (with large vector of predictors, all records end up “far away” from each other)In a large training set, it takes a long time to find distances to all the neighbors and then identify the nearest one(s)These constitute “curse of dimensionality”Dealing with the CurseReduce dimension of predictors (e.g., with PCA)Computational shortcuts that settle for “almost nearest neighbors”Example: Riding MowersData: 24 households classified as owning or not owning riding mowersPredictors: Income(in $1000), Lot Size (in 1000sq feet)Decision Boundaries When K=1Decision Boundaries When K=15Which k is the


View Full Document

BGSU STAT 402 - K-Nearest-Neighbor - 1 slide per page

Download K-Nearest-Neighbor - 1 slide per page
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view K-Nearest-Neighbor - 1 slide per page and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view K-Nearest-Neighbor - 1 slide per page 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?