Rutgers University CS 536 - Instance Based Learning - D535739

Home> Schools> Rutgers University- The State University of New Jersey> (CS) > CS 536> Instance Based Learning

Rutgers University CS 536 - Instance Based Learning

School name Rutgers University- The State University of New Jersey

Pages 12

Download Save

Unformatted text preview:

1Chapter 8:Instance Based LearningCS 536: Machine LearningLittman (Wu, TA)Instance Based Learning[Read Ch. 8]• k-Nearest Neighbor• Locally weighted regression• Radial basis functions• Case-based reasoning• Lazy and eager learning2Instance-Based LearningKey idea: just store all trainingexamples <xi, f (xi)>Nearest neighbor:• Given query instance xq, first locatenearest training example xn, thenestimate f (xq) ¨ f (xn)Problem of noise?^Adding Robustnessk-Nearest neighbor method:• Given xq, take vote among its knearest neighbors (if discrete-valued target function)• take mean of f values of k nearestneighbors (if real-valued)f (xq) ¨ Si=1k f (xn) / k^3When To Consider NN• Instances map to points in ¬n• Less than 20 attributes per instance• Lots of training dataAdvantages:• Training is very fast• Learn complex target functions• Don't lose informationDisadvantages:• Slow at query time• Easily fooled by irrelevant attributesVoronoi DiagramPartition of space by nearness toinstances.4Decision RulesSay p (x) defines probability that instance xwill be labeled 1 (positive) versus 0(negative).Gibbs Algorithm:• with probability p(x) predict 1, else 0Bayes optimal decision rule:• if p (x) > .5 then predict 1, else 0Note Gibbs has at most twice the expectederror of Bayes optimal.(Look familiar?)Behavior in the LimitNearest neighbor:• As number of training examplesgrows, approaches Gibbs Algorithmk-Nearest neighbor:• As number of training examplesgrows and k gets large, approachesBayes optimal5Distance-Weighted kNNMight want weight nearer neighborsmore heavily...f (xq) ¨ Si=1k wi f (xn) / Si=1k wiwhere wi ≡ 1/ d(xq, xi)2and d(xq, xi) is distance between xqand xiNote now it makes sense to use alltraining examples instead of just k• Shepard's method^Curse of DimensionalityImagine instances described by 20attributes, but only 2 are relevant totarget functionCurse of dimensionality: NN is easilymisled in high-dimensional spaceHow do data requirements grow withdimensionality?6Attribute WeightingOne approach:• Stretch j th axis by weight zj, where z1,…, zn chosen to minimize predictionerror• Use cross-validation to automaticallychoose weights z1, …, zn• Note setting zj to zero eliminates thisdimension altogethersee [Moore and Lee, 1994]Locally Weighted RegressionNote kNN forms local approximation to ffor each query point xqWhy not form an explicit approximationf (x) for region surrounding xq?• Fit linear function to k nearest neighbors• Fit quadratic, ...• Produces “piecewise approximation” to f^7What to MinimizeSeveral choices of error to minimize:• Squared error over k nearestneighborsE1(xq) ≡ 1/2 Sx in kNN(xq) (f (x) - f (x) )2• Distance-weighted squared errorover all neighborsE2(xq) ≡ 1/2 Sx in D (f (x)-f (x) )2 K(d(xq, x))^^Radial Basis Function Nets• Global approximation to targetfunction, in terms of linearcombination of localapproximations• Used, e.g., for image classification• A different kind of neural network• Closely related to distance-weighted regression, but “eager”instead of “lazy”8Radial Basis Function Netswhere ai (x) are theattributes describinginstance x, andf(x) = w0 +Su=1k wu Ku(d(xu, x))One common choice isTraining RBF NetworksQ1: What xu to use for each kernel functionKu(d(xu, x))• Scatter uniformly throughout instance space• Or use training instances (reflects instancedistribution)Q2: How to train weights (assume here GaussianKu)• First choose variance (and perhaps mean) foreach Ku– e.g., use EM• Then hold Ku fixed, and train linear output layer– efficient methods to fit linear function9Case-Based ReasoningCan apply instance-based learningeven when X ≠ ¬n• need different “distance” metricCase-Based Reasoning is instance-based learning applied to instanceswith symbolic logic descriptionsCBR Example( (user-complaint error53-on-shutdown)(cpu-model PowerPC)(operating-system Windows)(network-connection PCIA)(memory 48meg)(installed-applications Excel NetscapeVirusScan)(disk 1gig)(likely-cause ???))10CBR in CADETCADET: 75 stored examples of mechanicaldevices• each training example: < qualitativefunction, mechanical structure >• new query: desired function,• target value: mechanical structure forthis functionDistance metric: match qualitative functiondescriptionsCBR in CADET11CBR in CADET• Instances represented by rich structuraldescriptions• Multiple cases retrieved (and combined)to form solution to new problem• Tight coupling between case retrievaland problem solvingBottom line:• Simple matching of cases useful fortasks such as answering help-deskqueries• Area of ongoing researchLazy and Eager LearningLazy: wait for query beforegeneralizing• k-Nearest Neighbor, Case basedreasoningEager: generalize before seeing query• Radial basis function networks, ID3,Backpropagation, NaiveBayes, …12Which is Better?Does it matter?• Eager learner must create globalapproximation• Lazy learner can create many localapproximations• if they use same H, lazy canrepresent more complex functions(e.g., consider H = linear

View Full Document


School:
Email:
New Password:
Confirm Password:

Rutgers University CS 536 - Instance Based Learning

Sign up for free to view:

Please select your school