DOC PREVIEW
CMU CS 10701 - SVMs, Duality and the Kernel Trick

This preview shows page 1-2-3-4-28-29-30-31-58-59-60-61 out of 61 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 61 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

2006 Carlos Guestrin1SVMs, Duality and the Kernel Trick (cont.)Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 1st, 2006Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998)2006 Carlos Guestrin2SVMs reminder2006 Carlos Guestrin3Today’s lecture Learn one of the most interesting and exciting recent advancements in machine learning The “kernel trick” High dimensional feature spaces at no extra cost! But first, a detour Constrained optimization!2006 Carlos Guestrin4Dual SVM interpretationw.x+ b = 02006 Carlos Guestrin5Dual SVM formulation –the linearly separable case2006 Carlos Guestrin6Reminder from last time: What if the data is not linearly separable?Use features of features of features of features….Feature space can get really large really quickly!2006 Carlos Guestrin7Higher order polynomialsnumber of input dimensionsnumber of monomial termsd=2d=4d=3m – input featuresd – degree of polynomialgrows fast!d = 6, m = 100about 1.6 billion terms2006 Carlos Guestrin8Dual formulation only depends on dot-products, not on w!2006 Carlos Guestrin9Finally: the “kernel trick”! Never represent features explicitly Compute dot products in closed form Constant-time high-dimensional dot-products for many classes of features Very interesting theory – Reproducing Kernel Hilbert Spaces Not covered in detail in 10701/15781, more in 107022006 Carlos Guestrin10Common kernels Polynomials of degree d Polynomials of degree up to d Gaussian kernels Sigmoid2006 Carlos Guestrin11Overfitting? Huge feature space with kernels, what about overfitting??? Maximizing margin leads to sparse set of support vectors Some interesting theory says that SVMs search for simple hypothesis with large margin Often robust to overfitting2006 Carlos Guestrin12What about at classification time For a new input x, if we need to represent Φ(x), we are in trouble! Recall classifier: sign(w.Φ(x)+b) Using kernels we are cool!2006 Carlos Guestrin13SVMs with kernels Choose a set of features and kernel function Solve dual problem to obtain support vectors αi At classification time, compute:Classify as2006 Carlos Guestrin14Remember kernel regressionRemember kernel regression???1. wi= exp(-D(xi, query)2/ Kw2)2. How to fit with the local points?Predict the weighted average of the outputs:predict = Σwiyi/ Σwi2006 Carlos Guestrin15SVMs v. Kernel RegressionSVMsKernel Regressionor2006 Carlos Guestrin16SVMs v. Kernel RegressionSVMsKernel RegressionorDifferences: SVMs: Learn weights \alpha_i (and bandwidth) Often sparse solution KR: Fixed “weights”, learn bandwidth Solution may not be sparse Much simpler to implement2006 Carlos Guestrin17What’s the difference between SVMs and Logistic Regression?High dimensional features with kernelsLoss functionNoYes!Log-lossHinge lossLogisticRegressionSVMs2006 Carlos Guestrin18Kernels in logistic regression Define weights in terms of support vectors: Derive simple gradient descent rule on αi2006 Carlos Guestrin19What’s the difference between SVMsand Logistic Regression? (Revisited)Almost always no!Often yes!Solution sparseYes!Yes!High dimensional features with kernelsReal probabilities“margin”Semantics of outputLoss function Log-lossHinge lossLogisticRegressionSVMs2006 Carlos Guestrin20What you need to know Dual SVM formulation How it’s derived The kernel trick Derive polynomial kernel Common kernels Kernelized logistic regression Differences between SVMs and logistic regression2006 Carlos Guestrin21Acknowledgment SVM applet: http://www.site.uottawa.ca/~gcaron/applets.htm2006 Carlos Guestrin22PAC-learning, VC Dimension and Margin-based BoundsMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 1st, 2005More details:General: http://www.learning-with-kernels.org/Example of more complex bounds:http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz2006 Carlos Guestrin23What now… We have explored many ways of learning from data But… How good is our classifier, really? How much data do I need to make it “good enough”?2006 Carlos Guestrin24A simple setting… Classification m data points Finite number of possible hypothesis (e.g., dec. trees of depth d) A learner finds a hypothesis h that is consistentwith training data Gets zero error in training – errortrain(h) = 0 What is the probability that h has more than εtrue error? errortrue(h) ≥ ε2006 Carlos Guestrin25How likely is a bad hypothesis to get m data points right? Hypothesis h that is consistent with training data →got m i.i.d. points right Prob. h with errortrue(h) ≥ ε gets one data point right Prob. h with errortrue(h) ≥ ε gets m data points right2006 Carlos Guestrin26But there are many possible hypothesis that are consistent with training data2006 Carlos Guestrin27How likely is learner to pick a bad hypothesis Prob. h with errortrue(h) ≥ ε gets m data points right There are k hypothesis consistent with data How likely is learner to pick a bad one?2006 Carlos Guestrin28Union bound P(A or B or C or D or …)2006 Carlos Guestrin29How likely is learner to pick a bad hypothesis Prob. h with errortrue(h) ≥ ε gets m data points right There are k hypothesis consistent with data How likely is learner to pick a bad one?2006 Carlos Guestrin30Review: Generalization error in finite hypothesis spaces [Haussler ’88] Theorem: Hypothesis space H finite, dataset Dwith m i.i.d. samples, 0 < ε < 1 : for any learned hypothesis h that is consistent on the training data:2006 Carlos Guestrin31Using a PAC bound Typically, 2 use cases: 1: Pick ε and δ, give you m 2: Pick m and δ, give you ε2006 Carlos Guestrin32Review: Generalization error in finite hypothesis spaces [Haussler ’88] Theorem: Hypothesis space H finite, dataset Dwith m i.i.d. samples, 0 < ε < 1 : for any learned hypothesis h that is consistent on the training data:Even if h makes zero errors in training data, may make errors in test2006 Carlos Guestrin33Limitations of Haussler ‘88 bound Consistent classifier Size of hypothesis space2006 Carlos Guestrin34What if our classifier does not have zero error on the training


View Full Document

CMU CS 10701 - SVMs, Duality and the Kernel Trick

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download SVMs, Duality and the Kernel Trick
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view SVMs, Duality and the Kernel Trick and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SVMs, Duality and the Kernel Trick 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?