DOC PREVIEW
CMU CS 10701 - Lecture

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Solving optimal margin classifier z Recall our opt problem max w b s t 1 w yi wT xi b 1 i z This is equivalent to z 1 T w w 2 s t 1 yi wT xi b 0 i Write the Lagrangian min w b L w b z m 1 T w w i yi wT xi b 1 2 i 1 Recall that can be reformulated as min w b max i 0 L w b Now we solve its dual problem max i 0 min w b L w b 1 The Dual Problem max i 0 min w b L w b z We minimize L with respect to w and b first m w L w b w i yi xi 0 i 1 m b L w b i yi 0 i 1 m w i yi xi Note that implies i 1 z Plus back to L and using we have m L w b i i 1 1 m i j yi y j xTi x j 2 i j 1 The Dual problem cont z Now we have the following dual opt problem m max J i i 1 s t i 0 i 1 K k m y i 1 z 1 m i j yi y j xTi x j 2 i j 1 i i 0 This is again a quadratic programming problem z A global maximum of i can always be found z But what s the big deal z Note two things 1 w can be recovered by m w i yi x i See next xTi x j More later i 1 2 The kernel 2 Support vectors z Note the KKT condition only a few i s can be nonzero i g i w 0 i 1 K k Class 2 8 0 6 Call the training data points whose i s are nonzero the support vectors SV 10 0 7 0 5 0 4 0 9 0 Class 1 2 0 1 0 8 6 1 4 3 0 Support vector machines z Once we have the Lagrange multipliers i we can reconstruct the parameter vector w as a weighted combination of the training examples w y x i SV z i i i For testing with a new data z z Compute wT z b y x z b i SV i i T i and classify z as class 1 if the sum is positive and class 2 otherwise z Note w need not be formed explicitly 3 Interpretation of support vector machines z z z The optimal w is a linear combination of a small number of data points This sparse representation can be viewed as data compression as in the construction of kNN classifier To compute the weights i and to use support vector machines we need to specify only the inner products or T kernel between the examples x i x j We make decisions by comparing each new example z with only the support vectors y sign i yi xTi z b i SV Non linearly Separable Problems Class 2 Class 1 z z We allow error i in classification it is based on the output of the discriminant function wTx b i approximates the number of misclassified samples 4 Soft Margin Hyperplane z Now we have a slightly different opt problem min w b s t m 1 T w w C i 2 i 1 yi wT xi b 1 i i i 0 i z i are slack variables in optimization z Note that i 0 if there is no error for xi z i is an upper bound of the number of errors z C tradeoff parameter between error and margin The Optimization Problem z The dual of this new constrained optimization problem is m max s t J i i 1 0 i C i 1 K k m y i 1 z z 1 m i j yi y j xTi x j 2 i j 1 i i 0 This is very similar to the optimization problem in the linear separable case except that there is an upper bound C on i now Once again a QP solver can be used to find i 5 Extension to Non linear Decision Boundary z So far we have only considered large margin classifier with a linear decision boundary z How to generalize it to become nonlinear z Key idea transform xi to a higher dimensional space to make life easier z Input space the space the point xi are located z Feature space the space of xi after transformation Why transform z z Linear operation in the feature space is equivalent to non linear operation in input space z Classification can become easier with a proper transformation Transforming the Data Feature space Input space Note feature space is of higher dimension than the input space in practice z Computation in the feature space can be costly because it is high dimensional z z The feature space is typically infinite dimensional The kernel trick comes to rescue 6 The Kernel Trick z Recall the SVM optimization problem m max s t J i i 1 1 m i j yi y j xTi x j 2 i j 1 0 i C i 1 K k m y i 1 i i 0 z The data points only appear as inner product z As long as we can calculate the inner product in the feature space we do not need the mapping explicitly z Many common geometric operations angles distances can be expressed by inner products z Define the kernel function K by K x i x j x i T x j An Example for feature mapping and kernels z z z Consider an input x x1 x2 Suppose is given as follows x 1 1 2 x1 2 x2 x12 x22 2 x1 x2 x2 An inner product in the feature space is x x 1 1 x x 2 z 2 So if we define the kernel function as follows there is no need to carry out explicitly K x x 1 xT x 2 7 More examples of kernel functions z Linear kernel we ve seen it K x x xT x z Polynomial kernel we just saw an example K x x 1 xT x p where p 2 3 To get the feature vectors we concatenate all pth order polynomial terms of the components of x weighted appropriately z Radial basis kernel 2 1 K x x exp x x 2 In this case the feature space consists of functions and results in a nonparametric classifier Kernelized SVM z Training m max s t J i i 1 0 i C i 1 K k m y i 1 z 1 m i j yi y j K xi x j 2 i j 1 i i 0 Using y sign i yi K x i z b i SV 8 SVM examples Examples for Non Linear SVMs Gaussian Kernel 9 Cross validation error z The leave one out cross validation error does not …


View Full Document

CMU CS 10701 - Lecture

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?