BGSU STAT 402 - K-Nearest-Neighbor - 1 slide per page - D3428726

Home> Schools> Bowling Green State University - Main Campus> Applied Statistics (STAT) > STAT 402> K-Nearest-Neighbor - 1 slide per page

DOC PREVIEW

BGSU STAT 402 - K-Nearest-Neighbor - 1 slide per page

School name Bowling Green State University - Main Campus

Course Stat 402- Regression Analysis

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

K-Nearest-NeighborDr. İbrahim ÇaparAssistant ProfessorDATA MININGCharacteristicsData-driven, not model-drivenMakes no assumptions about the dataBasic IdeaFor a given record to be classified, identify nearby records“Near” means records with similar predictor values X1, X2, … XpClassify the record as whatever the predominant class is among the nearby records (the “neighbors”)How to measure “nearby”?The most popular distance measure is Euclidean distance− + − + ⋯ + − The data must be standardizedChoosing kK is the number of nearby neighbors to be used to classify the new recordK=1 means use the single nearest recordK=5 means use the 5 nearest recordsTypically choose that value of k which has lowest error rate in validation dataLow k vs. High kLow values of k (1, 3, …) capture local structure in data (but also noise)High values of k provide more smoothing, less noise, but may miss local structureThe extreme case of k = nUsing K-NN for ClassificationThe majority vote determines class. In case of tide randomly go with one or use the closest to data point to make determinationUsing K-NN for Prediction Instead of “majority vote determines class” use average of response valuesMay be a weighted average, weight decreasing with distanceAdvantagesSimpleNo assumptions required about Normal distribution, etc.Effective at capturing complex interactions among variables without having to define a statistical modelShortcomingsRequired size of training set increases exponentially with # of predictors, pThis is because expected distance to nearest neighbor increases with p (with large vector of predictors, all records end up “far away” from each other)In a large training set, it takes a long time to find distances to all the neighbors and then identify the nearest one(s)These constitute “curse of dimensionality”Dealing with the CurseReduce dimension of predictors (e.g., with PCA)Computational shortcuts that settle for “almost nearest neighbors”Example: Riding MowersData: 24 households classified as owning or not owning riding mowersPredictors: Income(in $1000), Lot Size (in 1000sq feet)Decision Boundaries When K=1Decision Boundaries When K=15Which k is the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

BGSU STAT 402 - K-Nearest-Neighbor - 1 slide per page

Sign up for free to view:

Please select your school