WSU CSE 6363 - Instance Based Learning - D2636658

Home> Schools> Washington State University> (CSE) > CSE 6363> Instance Based Learning

WSU CSE 6363 - Instance Based Learning

Pages 16

Download Save

Unformatted text preview:

Instance Based Learning• k-Nearest Neighbor• Locally weighted regression• Radial basis functions• Case-based reasoning• Lazy and eager learning1Instance-Based LearningKey idea: just store all training examples xi,f(xi)Nearest neighbor:• Given query instance xq, ﬁrst locate nearest trainingexample xn, then estimateˆf(xq) ← f(xn)k-Nearest neighbor:• Given xq, take vote among its k nearest nbrs (ifdiscrete-valued target function)• take mean of f values of k nearest nbrs (ifreal-valued)ˆf(xq) ←ki=1f(xi)k2When To Consider Nearest Neighbor• Instances map to points in n• Less than 20 attributes per instance• Lots of training dataAdvantages:• Training is very fast• Learn complex target functions• Don’t lose informationDisadvantages:• Slow at query time• Easily fooled by irrelevant attributes3Voronoi Diagram++−−−+−−+xq4Behavior in the LimitConsider p(x) deﬁnes probability that instance x will belabeled 1 (positive) versus 0 (negative).Nearest neighbor:• As number of training examples →∞, approachesGibbs AlgorithmGibbs: with probability p(x) predict 1, else 0k-Nearest neighbor:• As number of training examples →∞and k getslarge, approaches Bayes optimalBayes optimal: if p(x) >.5 then predict 1, else 0Note Gibbs has at most twice the expected error ofBayes optimal5Distance-Weighted kNNMight want to weight nearer neighbors more heavily...ˆf(xq) ←ki=1wif(xi)ki=1wiwherewi≡1d(xq,xi)2and d(xq,xi) is distance between xqand xiNote now it makes sense to use all training examplesinstead of just k→ Shepard’s method6Curse of DimensionalityImagine instances described by 20 attributes, but only 2are relevant to target functionCurse of dimensionality: nearest nbr is easily misleadwhen high-dimensional XOne approach:• Stretch jth axis by weight zj, where z1,...,znchosen to minimize prediction error• Use cross-validation to automatically choose weightsz1,...,zn• Note setting zjto zero eliminates this dimensionaltogethersee [Moore and Lee, 1994]7Locally Weighted RegressionNote kNN forms local approximation to f for eachquery point xqWhy not form an explicit approximationˆf(x) for regionsurrounding xq• Fit linear function to k nearest neighbors• Fit quadratic, ...• Produces “piecewise approximation” to fSeveral choices of error to minimize:• Squared error over k nearest neighborsE1(xq) ≡12x∈ k nearest nbrs of xq(f(x) −ˆf(x))2• Distance-weighted squared error over all nbrsE2(xq) ≡12x∈D(f(x) −ˆf(x))2K(d(xq,x))• ...8Radial Basis Function Networks• Global approximation to target function, in terms oflinear combination of local approximations• Used, e.g., for image classiﬁcation• A diﬀerent kind of neural network• Closely related to distance-weighted regression, but“eager” instead of “lazy”9Radial Basis Function Networks......f(x)w1w0wk11a (x)2a (x)na (x)where ai(x) are the attributes describing instance x, andf(x)=w0+ku=1wuKu(d(xu,x))One common choice for Ku(d(xu,x)) isKu(d(xu,x)) = e12σ2ud2(xu,x)10Training Radial Basis Function Net-worksQ1: What xuto use for each kernel functionKu(d(xu,x))• Scatter uniformly throughout instance space• One for each cluster of instances (use prototypes)• Or use training instances (reﬂects instancedistribution)Q2: How to train weights (assume here Gaussian Ku)• First choose variance (and perhaps mean) for eachKu– e.g., use EM• Then hold Kuﬁxed, and train linear output layer– eﬃcient methods to ﬁt linear function11Case-Based ReasoningCan apply instance-based learning even when X = n→ need diﬀerent “distance” metricCase-Based Reasoning is instance-based learning appliedto instances with symbolic logic descriptions((user-complaint error53-on-shutdown)(cpu-model PowerPC)(operating-system Windows)(network-connection PCIA)(memory 48meg)(installed-applications Excel Netscape VirusScan)(disk 1gig)(likely-cause ???))12Case-Based Reasoning in CADETCADET: 75 stored examples of mechanical devices• each training example:  qualitative function,mechanical structure• new query: desired function,• target value: mechanical structure for this functionDistance metric: match qualitative function descriptions13Case-Based Reasoning in CADETA stored case: ++++Function:T−junction pipeTQ= temperature= waterflowStructure:++++−++++Function:Structure: ?+A problem specification: Water faucetQ ,T121Q ,TQ ,T233QQTTQT121233CCtfQTccQThhQTmm14Case-Based Reasoning in CADET• Instances represented by rich structural descriptions• Multiple cases retrieved (and combined) to formsolution to new problem• Tight coupling between case retrieval and problemsolvingBottom line:• Simple matching of cases useful for tasks such asanswering help-desk queries• Area of ongoing research15Lazy and Eager LearningLazy: wait for query before generalizing• k-Nearest Neighbor, Case based reasoningEager: generalize before seeing query• Radial basis function networks, ID3, C4.5,Backpropagation, NaiveBayes, ...Does it matter?• Eager learner creates one global approximation• Lazy learner can create many local approximations• If they use same H, lazy can represent more complexfunctions (e.g., consider H = linear

View Full Document


School:
Email:
New Password:
Confirm Password:

WSU CSE 6363 - Instance Based Learning

Sign up for free to view:

Please select your school