STEVENS CS 559 - CS 559 11th Set of Notes - D1706985

Home> Schools> Stevens Institute of Technology> (CS) > CS 559> CS 559 11th Set of Notes

DOC PREVIEW

STEVENS CS 559 - CS 559 11th Set of Notes

School name Stevens Institute of Technology

Course Cs 559- Machine Learning Syllabus

Pages 75

This preview shows page 1-2-3-4-5-35-36-37-38-39-71-72-73-74-75 out of 75 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 75 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 559: Machine LearningCS 559: Machine Learning Fundamentals and Applications11thSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiEmail:Philippos Mordohai@stevens eduE-mail: [email protected]: Lieb 215Project LogisticsProject Logistics•Present project in class onDecember 9Present project in class on December 9 (10% of total grade)•Submit brief reportby 23:59 December 14•Submit brief report by 23:59 December 14 (15% of total grade) 68–6-8 pages • Present poster in CS Department event (%)(optional, 5% bonus)2Outline • Support Vector Machines (SVM)–Introduction– Linear Discriminant• Linearly Separable Case• Linearly Non Separable CaseDHS Chapter 5Slides based on Olga Veksler’s–Kernel Trick• Non Linear Discriminant– Multi-class SVMs–Practical notes on SVMs• Pre-processing• Overfitting• Ensemble Methods– BaggingPattern Classification, Chapter 5 3SVM ResourcesSVM Resources• Burges tutorialg– http://research.microsoft.com/en-us/um/people/cburges/papers/SVMTutorial.pdf•Shawe-Taylor andChristianinitutorialShaweTaylor and Christianinitutorial– http://www.support-vector.net/icml-tutorial.pdf• Lib SVMh // i d /~ jli /lib/–http://www.csie.ntu.edu.tw/~cjlin/libsvm/• LibLinear (linear classifier)–http://www.csie.ntu.edu.tw/~cjlin/liblinear/http://www.csie.ntu.edu.tw/ cjlin/liblinear/• SVM Light– http://svmlight.joachims.org/Pattern Classification, Chapter 5 4SVMsSVMs•One of the most important developmentsOne of the most important developments in pattern recognition in the last 15 years•Elegant theory•Elegant theory– Has good generalization properties• Have been applied to diverse problems very successfullyPattern Classification, Chapter 5 5Linear Discriminant FunctionsLinear Discriminant Functions• A discriminant function is linear if it can be written as•which separatinghyperplaneshould we choose?which separating hyperplaneshould we choose?Pattern Classification, Chapter 5 6Linear Discriminant FunctionsLinear Discriminant Functions• Training data is just a subset of all possible data–Suppose hyperplane is close to sample xi– If we see new sample close to xi, it may be on the wrong side of the hyperplanegyp p• Poor generalization (performance on unseen data)Pattern Classification, Chapter 5 7Linear Discriminant FunctionsLinear Discriminant Functions• Hyperplane as far as possible from any sample• New samples close to the old samples will be ppclassified correctly• Good generalizationPattern Classification, Chapter 5 8SVMSVM• Idea: maximize distance to the closest example• For the optimal hyperplane– distance to the closest negative example = distance to the closest positive exampleto the closest positive examplePattern Classification, Chapter 5 9SVM: Linearly Separable CaseSVM: Linearly Separable Case• SVM: maximize the margin• The marginis twice the absolute value of distance b of the closest example to the separatinghyperplanethe closest example to the separating hyperplane• Better generalization (performance on test data)– in practicedi h–and in theoryPattern Classification, Chapter 5 10SVM: Linearly Separable CaseSVM: Linearly Separable Case•Support vectorsare the samples closest to the•Support vectors are the samples closest to the separating hyperplane– They are the most difficult patterns to classifyRecall perceptron update rule–Recall perceptron update rule• Optimal hyperplane is completely defined by support vectorsOf course we do not know which samples are support–Of course, we do not know which samples are support vectors without finding the optimal hyperplanePattern Classification, Chapter 5 11SVM: Formula for the MarginSVM: Formula for the Margin•Absolute distance between x and the boundary g(x) = 0• Distance is unchanged for hyperplaneLtbllttthbd(th•Let xibe an example closest to the boundary (on the positive side). Set:• Now the largest margin hyperplane is uniquePattern Classification, Chapter 5 12SVM: Formula for the MarginSVM: Formula for the Margin•For uniqueness, set|wTxi+w0|=1for any sampleFor uniqueness, set |wxiw0|1for any sample xiclosest to the boundary• The distance from closest sample xitopig(x) = 0 is• Thus the margin isus t e a g sPattern Classification, Chapter 5 13SVM: Optimal HyperplaneSVM: Optimal Hyperplane• Maximize margin• Subject to constraints• Let •Can convert our problem to minimizeCan convert our problem to minimize• J(w) is a quadratic function, thus there is a single global minimumPattern Classification, Chapter 5 14SVM: Optimal HyperplaneSVM: Optimal Hyperplane•Use Kuhn-Tucker theorem to convert ourUse KuhnTucker theorem to convert our problem to:•a ={a1,…, an}are new variables, one for {1,,n},each sample•Optimized by quadratic programmingOptimized by quadratic programmingPattern Classification, Chapter 5 15SVM: Optimal HyperplaneSVM: Optimal Hyperplane•After finding the optimala={a1a}After finding the optimal a= {a1,…, an}• Final discriminant function:• where S is the set of support vectorsPattern Classification, Chapter 5 16SVM: Optimal HyperplaneSVM: Optimal Hyperplane•LD(a)depends on the number of samples notLD(a) depends on the number of samples, not on dimension – samples appear only through the dot products xjtxi• This will become important when looking for a nonlinear discriminant function, as we will see soonsoonPattern Classification, Chapter 5 17SVM: Non-Separable CaseSVM: NonSeparable Case• Data is most likely to be not linearly separable, but yyplinear classifier may still be appropriateC l SVM i li l bl•Can apply SVM in non linearly separable case• Data should be “almost” linearly separable for good performancegpPattern Classification, Chapter 5 18SVM: Non-Separable CaseSVM: NonSeparable Case• Use slack variables ξ1,…, ξn(one for each sample)Ch t i t f t•Change constraints from to•ξis a measure of deviation•ξiis a measure of deviationfrom the ideal for xi–ξi>1:xiis on the wrong side iiof the separating hyperplane–0 < ξi<1:xiis on the right side of separatinghyperplanebutof separating hyperplanebut within the region of maximum marginξ<0:is the ideal caseforx–ξi<0 :is the ideal case forxiPattern Classification, Chapter 5 19SVM: Non-Separable CaseSVM: NonSeparable Case• We would like to minimize•where•where• Constrained to•βis a constant that measures the relativeβis a

View Full Document