STEVENS CS 559 - CS 559 10th Set of Notes - D2710814

Home> Schools> Stevens Institute of Technology> (CS) > CS 559> CS 559 10th Set of Notes

DOC PREVIEW

STEVENS CS 559 - CS 559 10th Set of Notes

School name Stevens Institute of Technology

Course Cs 559- Machine Learning Syllabus

Pages 49

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 559: Machine LearningCS 559: Machine Learning Fundamentals and Applications10thSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiEmail:Philippos Mordohai@stevens eduE-mail: [email protected]: Lieb 215Pattern Classification, Chapter 1Pop-Up Quiz 7PopUp Quiz 72Outline • Support Vector Machines (SVM)–Introduction– Linear Discriminant• Linearly Separable Case• Linearly Non Separable Case–Kernel Trick• Non Linear Discriminant– Practical notes on SVMs• Pre-processing• Overfitting• Multi-class SVMs• DHS Chapter 5• Slides based on Olga Veksler’sPattern Classification, Chapter 5 3SVM ResourcesSVM Resources•Burges tutorialBurges tutorial– http://research.microsoft.com/en-us/um/people/cburges/papers/SVMTutorial.pdf•Lib SVM (new version released April 1,Lib SVM (new version released April 1, 2009)–http://www.csie.ntu.edu.tw/~cjlin/libsvm/pj• SVM Light–http://svmlight.joachims.org/pgj gPattern Classification, Chapter 5 4SVMsSVMs•One of the most important developmentsOne of the most important developments in pattern recognition in the last 10 years•Elegant theory•Elegant theory– Has good generalization properties• Have been applied to diverse problems very successfullyPattern Classification, Chapter 5 5Linear Discriminant FunctionsLinear Discriminant Functions• A discriminant function is linear if it can be written as•which separatinghyperplaneshould we choose?which separating hyperplaneshould we choose?Pattern Classification, Chapter 5 6Linear Discriminant FunctionsLinear Discriminant Functions• Training data is just a subset of all possible data–Suppose hyperplane is close to sample xi– If we see new sample close to xi, it may be on the wrong side of the hyperplanegyp p• Poor generalization (performance on unseen data)Pattern Classification, Chapter 5 7Linear Discriminant FunctionsLinear Discriminant Functions• Hyperplane as far as possible from any sample• New samples close to the old samples will be ppclassified correctly• Good generalizationPattern Classification, Chapter 5 8SVMSVM• Idea: maximize distance to the closest example• For the optimal hyperplane– distance to the closest negative example = distance to the closest positive exampleto the closest positive examplePattern Classification, Chapter 5 9SVM: Linearly Separable CaseSVM: Linearly Separable Case• SVM: maximize the margin• The marginis twice the absolute value of distance b of the closest example to the separatinghyperplanethe closest example to the separating hyperplane• Better generalization (performance on test data)– in practicedi h–and in theoryPattern Classification, Chapter 5 10SVM: Linearly Separable CaseSVM: Linearly Separable Case•Support vectors are the samples closest to the separating hyperplane– They are the most difficult patterns to classify• Optimal hyperplane is completely defined by support vectors–Of course, we do not know which samples are support pppvectors without finding the optimal hyperplanePattern Classification, Chapter 5 11SVM: Formula for the MarginSVM: Formula for the Margin•Absolute distance between x and the boundary g(x) = 0• Distance is unchanged for hyperplane•Letxbe an example closest to the boundary Set•Let xibe an example closest to the boundary. Set•Now the largest marginhyperplaneis uniqueNow the largest margin hyperplaneis uniquePattern Classification, Chapter 5 12SVM: Formula for the MarginSVM: Formula for the Margin•For uniqueness, set|wTxi+w0|=1for any sampleFor uniqueness, set |wxiw0|1for any sample xiclosest to the boundary• The distance from closest sample xitopig(x) = 0is• Thus the margin isus t e a g sPattern Classification, Chapter 5 13SVM: Optimal HyperplaneSVM: Optimal Hyperplane• Maximize margin• Subject to constraints• Let •Can convert our problem to minimizeCan convert our problem to minimize•J(w) is a quadratic function, thus there is a single global minimumPattern Classification, Chapter 5 14SVM: Optimal HyperplaneSVM: Optimal Hyperplane•Use Kuhn-Tucker theorem to convert ourUse KuhnTucker theorem to convert our problem to:•a ={a1,…, an}are new variables, one for {1,,n},each sample•Optimized by quadratic programmingOptimized by quadratic programmingPattern Classification, Chapter 5 15SVM: Optimal HyperplaneSVM: Optimal Hyperplane•After finding the optimala={a1a}After finding the optimal a= {a1,…, an}• Final discriminant function:• where S is the set of support vectorsPattern Classification, Chapter 5 16SVM: Optimal HyperplaneSVM: Optimal Hyperplane•LD(a)depends on the number of samplesLD(a) depends on the number of samples, not on dimension – samples appear only through the dot products xjtxi• This will become important when looking for a nonlinear discriminant function, as we will see soonsoonPattern Classification, Chapter 5 17SVM: Non-Separable CaseSVM: NonSeparable Case• Data is most likely to be not linearly separable, but yyplinear classifier may still be appropriateC l SVM i li l bl•Can apply SVM in non linearly separable case• Data should be “almost” linearly separable for good performancegpPattern Classification, Chapter 5 18SVM: Non-Separable CaseSVM: NonSeparable Case• Use slack variables ξ1,…, ξn(one for each sample)Ch t i t f t•Change constraints from to•ξis a measure of deviation•ξiis a measure of deviationfrom the ideal for xi–ξi>1:xiis on the wrong side iiof the separating hyperplane–0 < ξi<1:xiis on the right side of separatinghyperplanebutof separating hyperplanebut within the region of maximum marginξ<0:is the ideal caseforx–ξi<0 :is the ideal case forxiPattern Classification, Chapter 5 19Nonlinear MappingNonlinear Mapping• Cover’s theorem: “a pattern-classification problem cast in a high dimensional space nonlinearly iscast in a high dimensional space non-linearly is more likely to be linearly separable than in a low-dimensional space”Odi i l tli l bl•One dimensional space, not linearly separable• Lift to two dimensional space with φ(x)=(x,x2 )Pattern Classification, Chapter 5 20Nonlinear MappingNonlinear Mapping• To solve a non linear classification problem with a linear classifier1. Project data x to high dimension using function φ(x)2. Find a linear discriminant function for transformed data φ(x)3. Final nonlinear discriminant function is g(x) = wtφ(x) +w0g( )φ()0• In 2D, the discriminant

View Full Document