DOC PREVIEW
UT Dallas CS 6375 - svm-1

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11Machine LearningCS6375 --- Spring 2015Support Vector Machines2Linear ClassifiersHow would youclassify this data?23Linear ClassifiersAny of these would be fine …… but which is the best?4Linear ClassifiersDefine the marginof a linearclassifier as thewidth that theboundary could beincreased bybefore hitting adata point.35Linear ClassifiersThe maximummargin linearclassifier is thelinear classifierwith the maximum margin. This is thesimplest kind ofSVM (Called a linear SVM)6Linear ClassifiersThe maximummargin linearclassifier is thelinear classifierwith the maximum margin. This is thesimplest kind ofSVM (Called a linear SVM)47Why Maximum Margin?1. Intuitively this feels safest.2. If we’ve made a small error in the location of the boundary, this gives us least chance of causing a misclassification.3. LOOCV is easy since the model is immune to removal of any non-support-vector data points.4. Related theory (using VC dimension) showing that this is a good thing.5. Empirically it works very very well.8Specifying a Line and a Margin• How do we represent this mathematically?• …in m input dimensions?59Specifying a Line and a Margin• Plus-plane = { x : w . x + b = +1 }• Minus-plane = { x : w . x + b = -1 }Classify as.. +1 if w . x + b >= 1-1 if w . x + b <= -110Computing the Margin Width• Plus-plane = { x : w . x + b = +1 }• Minus-plane = { x : w . x + b = -1 }Claim: The vector w is perpendicular to the Plus Plane. Why?611Computing the Margin Width• Plus-plane = { x : w . x + b = +1 }• Minus-plane = { x : w . x + b = -1 }The vector w is perpendicular to the Plus Plane.• Let x-be any point on the minus plane• Let x+be the closest plus-plane-point to x-.Claim: x+= x-+ λ w for some value of λ. Why?12Computing the Margin Width• Plus-plane = { x : w . x + b = +1 }• Minus-plane = { x : w . x + b = -1 }The vector w is perpendicular to the Plus Plane.• Let x-be any point on the minus plane• Let x+be the closest plus-plane-point to x-.• Claim: x+= x-+ λ w for some value of λ. Why?713Computing the Margin WidthWhat we know:• w . x + b = +1• w . x + b = -1 • x+= x-+ λ w• |x+- x-| = MIt’s now easy to get Min terms of w and bw . (x-+ λ w) + b = 1=> w . x-+ b + λ w .w = 1=> -1 + λ w .w = 1=>14Computing the Margin WidthWhat we know:• w . x + b = +1• w . x + b = -1 • x+= x-+ λ w• |x+- x-| = M815Computing the Margin WidthGiven a guess of w and b we can• Compute whether all data points in the correct half-planes• Compute the width of the marginSo now we just need to write a program to search the space of w’s and b’s to find the widest margin (or the smallest w . w) that matches all the data points. How?16Learning the Maximum Margin ClassifierWhat should our quadraticoptimization criterion be?Minimize w.w• How many constraints will we have? What should they be?w . xk+ b >= 1 if yk= 1w . xk+ b <= -1 if yk= -1R917Learning via Quadratic Programming• QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints.18Quadratic Programming1019Quadratic ProgrammingThere exist algorithms for finding such constrained quadratic optima much more efficiently and reliably than gradient ascent.(But they are very fiddly…youprobably don’t want to writeone yourself)20Quadratic OptimizationWe’ll reviewunconstrained optimizationoptimization with linear equality constraintsoptimization with linear equality & inequality constraints1121Unconstrained OptimizationGoal: compute the minimum of a quadratic function f(x). Without loss of generality, let f(x) = x’Ax+b’x+cf has a minimum if and only if A is positive semi-definite. f has a unique minimum if and only if A is positive definite. In both cases the minimum is values of x that zero out the gradient. That is, the solutions of:Ax = -b/222Example 1Find the minimum ofWith x = (x1, x2), we have:Therefore, if there is a minimum it is the solution of:The solution is (-7, 2), so that if a minimum exists it is at x1= -7 and x2= 2. To argue that it is indeed a minimum we still need to show that the matrix is positive semi-definite. One way of showing that A is positive semi-definite is to show that there is a matrix G such that A = G’GIn our case with we can take=2111A1223Optimization with Linear Equality ConstraintsGoal: compute the minimum of f(x) where x is also subject to linear equality constraints.A linear equality constraint can be written as cx = d where c is a vector and d is a scalar. When there are k constraints we write them as cix = di.Trick: reduce this to the case of unconstrained optimization using the method of Lagrange multipliers.24Optimization with Linear Equality Constraints• In order to minimize f(x) under the above equality constraints we define a Lagrangian multiplier αifor the ithconstraint and form the Lagrangian function L(x, α1, . . . , αk) as follows:• We can now treat L as a function of n + k variables and use the method of unconstrained minimization to find the unconstrained minimum of L.1325Example 2Minimize under the two constraints Method 1: By direct substitution, we see that Then, by taking derivatives and equating to 0, we see that the minimum is at x1= −1/2. Method 2: Using the Lagrange technique, we haveThe derivative wrt x1gives: 2x1− α1= 0The derivative wrt x2gives: 2x2+ α1= 0The derivative wrt x3gives: 2x3+ α2= 0The derivative wrt α1gives: x2− x1− 1 = 0The derivative wrt α2gives: x3− 1 = 0Treating this as a system of 5 equations with 5 unknowns we get x1= −1/2, x2= 1/2, x3= 1, α1= −1, α2= −226The Dual ProblemThere is another way of using the Lagrangian to solve the constrained optimization problem.By treating the Lagrangian multipliers as constants, we (1) optimize the Lagrangian with respect to x by taking the derivatives only with respect to x. (2) solve the resulting system of linear equations for x in terms of the Lagrangian multipliers.(3) substitute back into the Lagrangian and transform it into a function of only the Lagrangian multipliers. This dual problem can be solved using unconstrained techniques.1427Example 3To convert Example 2 into a dual form we only need to take thederivatives of the Lagrangian with respect to x1, x2, x3. As above:The derivative wrt x1gives: 2x1− α1= 0The derivative wrt


View Full Document

UT Dallas CS 6375 - svm-1

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download svm-1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view svm-1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view svm-1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?