Support Vector Machines Kernel Methods Why SVMs Question at what serial number did the new 5 bill enter circulation Old Old Old Old New New New Serial No Why SVMs Question at what serial number did the new 5 bill enter circulation Old Old Old Old New New New Serial No Valid Region If we assume approximately uniformly distributed observations then the likelihood will be approximately uniform over max Old min New Min expected squared error is max margin Why SVMs Sometimes the boundaries of the classes are more informative than the overall distribution of the classes Why SVMs Sometimes the boundaries of the classes are more informative than the overall distribution of the classes SVMs are faster Hard margin SVMs Enforce that all points are out of the margin T w x j b y j a Then maximize margin a max w w Here a is the margin after points are projected onto w solution is the same for any a Hard margin SVMs set a 1 rewrite min w w T w x j b y j 1 Hard margin SVMs Dual form W is a linear combination of training examples M w l y l x l l 1 Can optimize s directly s will be 0 except for support vectors Soft margin SVMs Slack variables which represent how wrong our prediction is min w C j w T j w x j b y j 1 j Support Vector Machines Kernel Methods Why Kernels The HOG features of a patch Edge Detection Dalal Triggs 2005 Why Kernels Given this dog as input This window is very close And both of these windows are somewhat close Why Kernels This window is very far Distances mean nothing past a certain point We want a classifier that gives more weight to nearby examples Why Kernels Sometimes it is easier to define similarity between examples than it is to embed them in a feature space Similarity of two patches a and b for example 2 2 HOG a HOG b exp 2 2 The kernel lets us not worry about the underlying HOG space Classification learning with Kernels Simplest idea k nearest neighbors Classification learning with Kernels Simplest idea k nearest neighbors Find nearest points using the kernel Linear methods with Kernels We want to maintain the properties of linear methods such as linear regression and especially support vector machines One approach find a possibly infinitedimensional space where dot product between two points in the space equals the kernel evaluated on the two points Linear methods with Kernels Largest 16 bases corresponding to the Gaussian kernel in one dimension over a bounded interval Hastie Tibshirani Friedman 2009 How do you compute these bases You don t You get lucky with math instead This is the kernel trick for many important problems the final regression function has the form f x x l trainingSet l l x x Where is the kernel and the s are a function of only the training data Plug in testing examples as x and get a prediction in time linear in the size of the training set How to compute the s For linear regression in the a known space use this formula to compute the s T 1 XX I m y f x l l x x x trainingSet l See Tom s slides for derivation How to compute the s For linear regression in the expanded space use this formula to compute the s 1 K I m y f x l l l x x x trainingSet j Where K ij x i x See Tom s slides for derivation This works for SVM s too For any valid kernel the final SVM classifier will have the form f x l x trainingSet Compute s via l l x x
View Full Document