SWARTHMORE CS 63 - Support Vector Machines and Kernels - D992933

Home> Schools> Swarthmore College> (CS) > CS 63> Support Vector Machines and Kernels

SWARTHMORE CS 63 - Support Vector Machines and Kernels

Pages 42

Download Save

Unformatted text preview:

Support Vector Machines and KernelsSlide 2Supervised ML = PredictionWhy might predictions be wrong?Why might predictions be wrong?Slide 6Representational BiasSupport Vector MachinesStrengths of SVMsLinear SeparatorsIntuitionsSlide 12Slide 13Slide 14A “Good” SeparatorNoise in the ObservationsRuling Out Some SeparatorsLots of NoiseMaximizing the Margin“Fat” SeparatorsWhy Maximize Margin?VC Dimension ExampleBounding Generalization ErrorSupport VectorsThe MathSlide 26Dual Optimization ProblemWhat if Data Are Not Perfectly Linearly Separable?Slide 29What if Surface is Non-Linear?Kernel MethodsWhen Linear Separators FailMapping into a New Feature SpaceKernelsThe Polynomial KernelSlide 36A Few Good KernelsThe Kernel TrickUsing a Different Kernel in the Dual Optimization ProblemExotic KernelsApplication: “Beautification Engine” (Leyvand et al., 2008)ConclusionSupport Vector Support Vector Machines and KernelsMachines and KernelsAdapted from slides by Tim OatesAdapted from slides by Tim OatesCognition, Robotics, and Learning (CORAL) LabCognition, Robotics, and Learning (CORAL) LabUniversity of Maryland Baltimore CountyUniversity of Maryland Baltimore CountyDoing Doing ReallyReally Well with Well with Linear Decision SurfacesLinear Decision SurfacesOutlineOutlinePredictionPredictionWhy might predictions be wrong?Why might predictions be wrong?Support vector machinesSupport vector machinesDoing really well with linear modelsDoing really well with linear modelsKernelsKernelsMaking the non-linear linearMaking the non-linear linearSupervised ML = PredictionSupervised ML = PredictionGiven training instances (x,y)Given training instances (x,y)Learn a model fLearn a model fSuch that f(x) = ySuch that f(x) = yUse f to predict y for new xUse f to predict y for new xMany variations on this basic themeMany variations on this basic themeWhy might predictions be wrong?Why might predictions be wrong?True Non-Determinism True Non-Determinism Flip a biased coinFlip a biased coinp(p(headsheads) = ) = Estimate Estimate If If  > 0.5 predict > 0.5 predict headsheads, else , else tailstailsLots of ML research on problems like thisLots of ML research on problems like thisLearn a modelLearn a modelDo the best you can in expectationDo the best you can in expectationWhy might predictions be wrong? Why might predictions be wrong? Partial Observability Partial Observability Something needed to predict y is missing Something needed to predict y is missing from observation xfrom observation xN-bit parity problemN-bit parity problemx contains N-1 bits (hard PO)x contains N-1 bits (hard PO)x contains N bits but learner ignores some of them x contains N bits but learner ignores some of them (soft PO)(soft PO)Why might predictions be wrong?Why might predictions be wrong?True non-determinismTrue non-determinismPartial observabilityPartial observabilityhard, softhard, softRepresentational biasRepresentational biasAlgorithmic biasAlgorithmic biasBounded resourcesBounded resourcesRepresentational BiasRepresentational BiasHaving the right features (x) is crucialHaving the right features (x) is crucialXOO O O XXXXOOOOXXXSupport Vector Support Vector MachinesMachinesDoing Doing ReallyReally Well with Linear Well with Linear Decision SurfacesDecision SurfacesStrengths of SVMsStrengths of SVMsGood generalization in theoryGood generalization in theoryGood generalization in practiceGood generalization in practiceWork well with few training instancesWork well with few training instancesFind globally best modelFind globally best modelEfficient algorithmsEfficient algorithmsAmenable to the kernel trickAmenable to the kernel trickLinear SeparatorsLinear SeparatorsTraining instancesTraining instancesx x  nny y  {-1, 1} {-1, 1}w w  nnb b  HyperplaneHyperplane<w, x> + b = 0<w, x> + b = 0ww11xx11 + w + w22xx22 … + w … + wnnxxnn + b = 0 + b = 0Decision functionDecision functionf(x) = sign(<w, x> + b)f(x) = sign(<w, x> + b)Math ReviewMath ReviewInner (dot) product:Inner (dot) product:<a, b> = a · b = ∑ a<a, b> = a · b = ∑ aii*b*bii = a= a11bb11 + a + a22bb22 + …+a + …+annbbnnIntuitionsIntuitionsXXOOOOOOXXXXXXOOIntuitionsIntuitionsXXOOOOOOXXXXXXOOIntuitionsIntuitionsXXOOOOOOXXXXXXOOIntuitionsIntuitionsXXOOOOOOXXXXXXOOA “Good” SeparatorA “Good” SeparatorXXOOOOOOXXXXXXOONoise in the ObservationsNoise in the ObservationsXXOOOOOOXXXXXXOORuling Out Some SeparatorsRuling Out Some SeparatorsXXOOOOOOXXXXXXOOLots of NoiseLots of NoiseXXOOOOOOXXXXXXOOMaximizing the MarginMaximizing the MarginXXOOOOOOXXXXXXOO““Fat” SeparatorsFat” SeparatorsXXOOOOOOXXXXXXOOWhy Maximize Margin?Why Maximize Margin?Increasing margin reduces Increasing margin reduces capacitycapacityMust restrict capacity to generalize Must restrict capacity to generalize m training instancesm training instances22mm ways to label them ways to label themWhat if function class that can separate them What if function class that can separate them all?all?ShattersShatters the training instances the training instancesVC Dimension is largest m such that VC Dimension is largest m such that function class can shatter some set of m function class can shatter some set of m pointspointsVC Dimension ExampleVC Dimension ExampleXX XOX XXO XXX OOO XOX OXO OOO OBounding Generalization ErrorBounding Generalization ErrorR[f] = risk, test errorR[f] = risk, test errorRRempemp[f] = empirical risk, train error[f] = empirical risk, train errorh = VC dimensionh = VC dimensionm = number of training instancesm = number of training instances  = probability that bound does not hold= probability that bound does not hold1m2mhln+ 14+ lnhR[f]  Remp[f] +Support VectorsSupport VectorsXXOOOOOOOOXXXXXXThe MathThe MathTraining instancesTraining instancesx x  nny y  {-1, 1} {-1, 1}Decision functionDecision functionf(x) = sign(<w,x> + b)f(x) = sign(<w,x> + b)w w  nnb b  Find w and b that Find w and b that Perfectly classify training instancesPerfectly classify training instancesAssuming linear separabilityAssuming linear separabilityMaximize marginMaximize marginThe MathThe MathFor perfect classification, we wantFor perfect classification, we wantyyii

View Full Document


School:
Email:
New Password:
Confirm Password:

SWARTHMORE CS 63 - Support Vector Machines and Kernels

Sign up for free to view:

Please select your school