UT Dallas CS 6375 - svm-2 - D3105301

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6375> svm-2

DOC PREVIEW

UT Dallas CS 6375 - svm-2

School name University of Texas at Dallas

Course Cs 6375- Machine Learning

Pages 16

This preview shows page 1-2-3-4-5 out of 16 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning CS6375 Spring 2015 Support Vector Machines II a 1 The Dual Problem Maximize L R i 1 1 ai 2 R R aa i 1 j 1 i j G ij G ij y i y j x i x j subject to i 0 w i yi xi R i 1 i yi 0 i 0 for SV i 2 1 But what if the data is not linearly separable Idea 1 Find minimum w w while minimizing number of training set errors Problematic Two things to minimize makes for an ill defined optimization 3 But what if the data is not linearly separable Idea 1 1 Minimize w w C train errors There s a practical problem with this approach It doesn t distinguish between disastrous errors and near misses 4 2 But what if the data is not linearly separable Idea 1 2 Minimize w w C distance of error points to their correct place 5 Learning the Maximum Margin Classifier Given guess of w b we can Compute sum of distances of points to their correct zones Compute the margin width Assume R data points each xk yk where yk 1 What should our quadratic optimization criterion be How many constraints will we have What should they be 6 3 Learning the Soft Margin Classifier Given guess of w b we can Compute sum of distances of points to their correct zones Compute the margin width Assume R datapoints each xk yk where yk 1 What should our quadratic optimization criterion be Minimize How many constraints will we have 2R What should they be w xk b 1 k if yk 1 w xk b 1 k if yk 1 k 0 for all k 7 Learning the Soft Margin Classifier Given guess of w b we can Compute sum of distances of points to their correct zones Compute the margin width Assume R datapoints each Our original noiseless data yQP had1 xk yk where k m 1 variables w1 w2 wm and b Our newHow noisy data QP has m 1 R many constraints will we What should our quadratic variables w1 w2R have 2 wm b 1 R optimization criterion be What should they be Minimize w xk b 1 k if yk 1 w xk b 1 k if yk 1 k 0 for all k 8 4 Controlling Soft Margin Separation C controls trade off between margin and error 9 The Dual QP Data points with k 0 will be the support vectors so this sum only needs to be 10 over the support vectors 5 Soft Margin Dual Problem One factor i for each training example 0 i C SV with i 0 i C SV with i 0 i 0 otherwise 11 SVM Training and Testing Remember dot product of instances in both training and testing Training L R i 1 ai 1 2 R R aa i 1 j 1 i j G ij G ij y i y j x i x j Testing w x b y x x xi SV i i i b 12 6 Example suppose we are in 1 dimension What would SVMs do with this data Not a big surprise 13 Harder 1 dimensional Dataset What can be done about this Idea 1 use soft margin allow some errors 14 7 Harder 1 dimensional Dataset Let s project the data points to a 2D space using non linear basis functions 15 Harder 1 dimensional Dataset Let s project the data points to a 2D space using non linear basis functions Now the data is linearly separable 16 8 Learning in the Feature Space Map data into a feature space where they are linearly separable x x 17 Non Linear SVMs Feature Spaces General idea the original feature space can be mapped to other feature space likely higher dimensional where the training set is separable 18 9 QP with Basis Functions w k s t k 0 k y k xk b yk 1 k xk w Classify with f x w b sign w x b for SV xk This is sensible Is that the end of the story No there s one more trick 19 Quadratic Basis Functions Number of terms assuming m input dimensions m2 2 What are those 2 s doing You should be happy they do no harm You ll find out why they re there soon 20 10 QP with Basis Functions We must do R2 2 dot products to get this matrix ready w k s t k 0 k y k xk Each dotClassify product with requires m2 2 additions and multiplications f xthing w bcosts sign w x b 2 4 b yk 1 k xk w The whole R2 m for SV xk This is computationally expensive 21 Quadratic Dot Products 22 11 Quadratic Dot Products Let s look at another function of a and b They re the same What s the complexity of each m vs m2 23 QP with Basis Functions We must do R2 2 dot products to get this matrix ready w y k xk Each dotClassify product with now only requires m additions and multiplications The Kernel f x Trick w b sign w x b b yk 1 k xk w k s t k 0 for SV xk k Called kernel gram matrix xk xl K xk xl 24 12 Higher Order Polynomials 25 QP with Basis Functions What about testing Expensive in the new feature space f x w b sign w x b Example 5 degree polynomial 26 13 Overfitting Curse of dimensionality Mapping to high dimensional space x x Are we overfitting the training data The use of Maximum Margin magically makes this not a problem No matter what the basis function is there are only up to R parameters 1 2 R and usually most are set to zero by the Maximum Margin 27 SVM Kernel Functions K a b a b 1 d is an example of an SVM Kernel Function Beyond polynomials there are other very high dimensional basis functions that can be made practical by finding the right Kernel Function Radial Basis style Kernel Function Neural net style Kernel Function String kernel tree kernel 28 14 Kernels A function that returns the value of the dot product between the images of the two arguments Given a function K it is possible to verify that it is a kernel Can construct kernels from simple ones 29 SVM Performance Anecdotally they work very very well indeed A lot of empirical and theoretical studies Can use ensemble ideas to do multi class classification 30 15 What you should know Linear SVMs The definition of a maximum margin classifier What QP can do for you for this class you don t need to know how it does it How Maximum Margin can be turned into a QP problem How we deal with noisy non separable data How we permit non linear boundaries How SVM Kernel functions permit us to pretend we re working with ultra high dimensional basis function terms 31 16

View Full Document

UT Dallas CS 6375 - svm-2

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 16 pages.

UT Dallas CS 6375 - svm-2

Sign up for free to view:

Please select your school