CMU CS 10601 - Support vector machines - D2181291

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> Support vector machines

DOC PREVIEW

CMU CS 10601 - Support vector machines

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1Slide 2The Big IdeaSlide 4Geometric IntuitionGeometric IntuitionPrimal VersionDUAL VersionPrimal vs DualDUAL: the “Support vector” version“Support Vector”s?“Support Vector”s?Playing With SVMSMore on KernelsSlide 15Can we used Kernels to Measure Distances?Continued:Popular Kernel MethodsSupport Vector Machines Joseph GonzalezFrom a linear classifier to ...*One of the most famous slides you will see, ever!The Big IdeaOXOOXXXXXXOOOOOOMaximum marginMaximum possible separation between positive and negative training examples*One of the most famous slides you will see, ever!Geometric IntuitionOXOOOXXXSUPPORT VECTORSGeometric IntuitionOXXOOOXXXSUPPORT VECTORSPrimal Versionmin ||w||2 +C ∑ξs.t. (w.x + b)y ≥ 1-ξξ ≥ 0DUAL VersionWhere did this come from?Remember Lagrange MultipliersLet us “incorporate” constraints into objectiveThen solve the problem in the “dual” space of lagrange multipliersmax ∑α -1/2 ∑αiαjyiyjxixjs.t. ∑αiyi = 0C ≥ αi ≥ 0Primal vs DualNumber of parameters?large # features?large # examples?for large # features, DUAL preferredmany αi can go to zero!max ∑α -1/2 ∑αiαjyiyjxixjs.t. ∑αiyi = 0C ≥ αi ≥ 0min ||w||2 +C ∑ξs.t. (w.x + b)y ≥ 1-ξξ ≥ 0DUAL: the “Support vector” versionHow do we find α?Quadratic programmingHow do we find C? Cross-validation! Wait... how do we predict y for a new point x?? How do we find w? How do we find b?y = sign(w.x+b)w = Σi αi yi ximax ∑α - 1/2 ∑αiαjyiyjxixjs.t. ∑αiyi = 0C ≥ αi ≥ 0y = sign(Σi αi yi xi xj + b)max α1 + α2 + 2α1α2 - α12/2 - 4α22s.t. α1-α2 = 0C ≥ αi ≥ 0“Support Vector”s?OXα1α2max ∑α - 1/2 ∑αiαjyiyjxixjs.t. ∑αiyi = 0C ≥ αi ≥ 0(0,1)(2,2)max ∑α - α1α2(-1)(0+2)- 1/2 α12(1)(0+1) - 1/2 α22(1)(4+4)w = Σi αi yi xiw = .4([0 1]-[2 2]) =.4[-2 -1 ]y=w.x+bb = y-w.xx1: b = 1-.4 [-2 -1][0 1] = 1+.4 =1.4 b 4/5α1=α2=αmax 2α -5/2α2 max 5/2α(4/5-α) 0 2/5α1=α2=2/5“Support Vector”s?OXα1α2max ∑α - 1/2 ∑αiαjyiyjxixjs.t. ∑αiyi = 0C ≥ αi ≥ 0(0,1)(2,2)Oα3What is α3? Try this at homePlaying With SVMS•http://www.csie.ntu.edu.tw/~cjlin/libsvm/More on Kernels•Kernels represent inner products–K(a,b) = a.b–K(a,b) = φ(a) . φ(b) •Kernel trick is allows extremely complex φ( ) while keeping K(a,b) simple•Goal: Avoid having to directly construct φ( ) at any point in the algorithmKernelsComplexity of the optimization problem remains only dependent on the dimensionality of the input space and not of the feature space!Can we used Kernels to Measure Distances?•Can we measure distance between φ(a) and φ(b) using K(a,b)?Continued:Popular Kernel Methods•Gaussian Processes•Kernel Regression (Smoothing)–Nadarayan-Watson Kernel

View Full Document