U of M CS 5751 - Lecture notes - D2417704

Home> Schools> University of Minnesota- Twin Cities> (CS) > CS 5751> Lecture notes

U of M CS 5751 - Lecture notes

School name University of Minnesota- Twin Cities

Pages 4

Download Save

Unformatted text preview:

1CS 5751 Machine LearningChapter 8 Instance Based Learning 1Instance Based Learning • k-Nearest Neighbor• Locally weighted regression• Radial basis functions• Case-based reasoning• Lazy and eager learningCS 5751 Machine LearningChapter 8 Instance Based Learning 2Instance-Based Learningkxfxfkfkxkxfxfxx),f(xxkiiqqnqnqii∑=←••−←•><1)()(ˆ valued)-real (if neighborsnearest of values ofmean Take function) target valued-discrete (if neighborsnearest its among vote take,Given :neighborNearest )()(ˆ estimate , examplenearest locate , instancequery Given :neighbor)Nearest -(1neighbor Nearest examples trainingall storejust :ideaKey CS 5751 Machine LearningChapter 8 Instance Based Learning 3When to Consider Nearest Neighbor• Instance map to points in Rn• Less than 20 attributes per instance• Lots of training dataAdvantages• Training is very fast• Learn complex target functions• Do not lose informationDisadvantages• Slow at query time• Easily fooled by irrelevant attributesCS 5751 Machine LearningChapter 8 Instance Based Learning 4k-NN Classificationxq5-Nearest Neighbor1-NN Decision SurfaceCS 5751 Machine LearningChapter 8 Instance Based Learning 5Behavior in the LimitDefine p(x) as probability that instance x will be labeled 1 (positive) versus 0 (negative)Nearest Neighbor• As number of training examples approaches infinity, approaches Gibbs AlgorithmGibbs: with probability p(x) predict 1, else 0k-Nearest Neighbor:• As number of training examples approaches infinity and kgets large, approaches Bayes optimalBayes optimal: if p(x) > 0.5 then predict 1, else 0• Note Gibbs has at most twice the expected error of Bayes optimalCS 5751 Machine LearningChapter 8 Instance Based Learning 6Distance-Weighted k-NNmethod sShepard'just of insteadexamples training use tosense makesit now Note, and between distance is and),(1 where)()(ˆ ...heavily more neighborsnearer weight toMight want211→≡←∑∑==kallxx),xd(xxxdwwxfwxfiqiqiqikiikiiiq2CS 5751 Machine LearningChapter 8 Instance Based Learning 7Curse of DimensionalityImagine instances described by 20 attributes, but only 2 are relevant to target functionCurse of dimensionality: nearest neighbor is easily misled when high-dimensional XOne approach:• Stretch jth axis by weight zj, where z1,z2,…,znchosen to minimize prediction error• Use cross-validation to automatically choose weights z1,z2,…,zn• Note setting zjto zero eliminates dimension j altogethersee (Moore and Lee, 1994)CS 5751 Machine LearningChapter 8 Instance Based Learning 8Locally Weighted Regression)),(())(ˆ)(()( neighbors allover error squared weighted-Distance ))(ˆ)(()( neighborsnearest over error Squared :minimize error to of choices Several toion"approximat piecewise" Produces etc. quadratic,fit Or neighborsnearest ofunction tlinear Fit ? aroundregion for ˆion approximatexplicit formnot Why point query each for ion toapproximat local forms NN-2212 2211xxdKxfxfxExfxfxEkfkx(x)fxfkqDxqxofneighborsnearestkxqqqq∑∑∈∈−≡•−≡••••CS 5751 Machine LearningChapter 8 Instance Based Learning 9Radial Basis Function Networks• Global approximation to target function, in terms of linear combination of local approximations• Used, for example, in image classification• A different kind of neural network• Closely related to distance-weighted regression, but “eager” instead of “lazy”CS 5751 Machine LearningChapter 8 Instance Based Learning 10Radial Basis Function Networksw2wkw1w01f(x)a1(x) a2(x) an(x)),(σ211022 is for choicecommon One)),(()( and , instance describing attributes theare wherexxduuuukuuuuiuue,x))(d(xK,x))(d(xKxxdKwwxfx(x)a=+=∑=CS 5751 Machine LearningChapter 8 Instance Based Learning 11Training RBF NetworksQ1: What xuto use for kernel function Ku(d(xu,x))?• Scatter uniformly through instance space• Or use training instances (reflects instance distribution)Q2: How to train weights (assume here Gaussian Ku)?• First choose variance (and perhaps mean) for each Ku– e.g., use EM• Then hold Kufixed, and train linear output layer– efficient methods to fit linear functionCS 5751 Machine LearningChapter 8 Instance Based Learning 12Case-Based ReasoningCan apply instance-based learning even when XV Rn→ need different “distance” metricCase-Based Reasoning is instance-based learning applied to instances with symbolic logic descriptions:((user-complaint error53-on-shutdown)(cpu-model PowerPC)(operating-system Windows)(network-connection PCIA)(memory 48meg)(installed-applications Excel Netscape VirusScan)(disk 1Gig)(likely-cause ???))3CS 5751 Machine LearningChapter 8 Instance Based Learning 13Case-Based Reasoning in CADETCADET: 75 stored examples of mechanical devices• each training example: <qualitative function, mechanical structure>• new query: desired function• target value: mechanical structure for this functionDistance metric: match qualitative function descriptionsCS 5751 Machine LearningChapter 8 Instance Based Learning 14Case-Based Reasoning in CADETA problem specification: Water faucetStructure:?Function:QmTcThTm++-+QcQh++CcCh++++A stored case: T-junction pipeStructure:T = temperatureQ = waterflowFunction:Q1Q2Q3++T1T2T3++Q1,T1Q3,T3Q2,T2CS 5751 Machine LearningChapter 8 Instance Based Learning 15Case-Based Reasoning in CADET• Instances represented by rich structural descriptions• Multiple cases retrieved (and combined) to form solution to new problem• Tight coupling between case retrieval and problem solvingBottom line:• Simple matching of cases useful for tasks such as answering help-desk queries• Area of ongoing researchCS 5751 Machine LearningChapter 8 Instance Based Learning 16Lazy and Eager LearningLazy: wait for query before generalizing• k-Nearest Neighbor, Case-Based ReasoningEager: generalize before seeing query• Radial basis function networks, ID3, Backpropagation, etc.Does it matter?• Eager learner must create global approximation• Lazy learner can create many local approximations• If they use same H, lazy can represent more complex functions (e.g., consider H=linear functions)CS 5751 Machine LearningChapter 8 Instance Based Learning 17kd-trees (Moore)• Eager version of k-Nearest Neighbor• Idea: decrease time to find neighbors– train by constructing a lookup (kd) tree– recursively subdivide space• ignore class of points• lots of possible mechanisms: grid, maximum

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M CS 5751 - Lecture notes

Sign up for free to view:

Please select your school