DOC PREVIEW
CMU CS 10701 - SVMs, Duality and the Kernel Trick (cont.)

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

©2005-2007 Carlos Guestrin1SVMs, Duality and theKernel Trick (cont.)Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityFebruary 28th, 2007©2005-2007 Carlos Guestrin2SVMs reminder©2005-2007 Carlos Guestrin3Dual SVM formulation –the non-separable case©2005-2007 Carlos Guestrin4Reminder from last time: What if thedata is not linearly separable?Use features of features of features of features….Feature space can get really large really quickly!©2005-2007 Carlos Guestrin5Dual formulation only depends ondot-products, not on w!©2005-2007 Carlos Guestrin6Dot-product of polynomials©2005-2007 Carlos Guestrin7Finally: the “kernel trick”! Never represent features explicitly Compute dot products in closed form Constant-time high-dimensional dot-products for many classes of features Very interesting theory – ReproducingKernel Hilbert Spaces Not covered in detail in 10701/15781,more in 10702©2005-2007 Carlos Guestrin8Common kernels Polynomials of degree d Polynomials of degree up to d Gaussian kernels Sigmoid©2005-2007 Carlos Guestrin9Overfitting? Huge feature space with kernels, what aboutoverfitting??? Maximizing margin leads to sparse set of supportvectors Some interesting theory says that SVMs search forsimple hypothesis with large margin Often robust to overfitting©2005-2007 Carlos Guestrin10What about at classification time For a new input x, if we need to represent Φ(x),we are in trouble! Recall classifier: sign(w.Φ(x)+b) Using kernels we are cool!©2005-2007 Carlos Guestrin11SVMs with kernels Choose a set of features and kernel function Solve dual problem to obtain support vectors αi At classification time, compute:Classify as©2005-2007 Carlos Guestrin12Remember kernel regressionRemember kernel regression???1. wi = exp(-D(xi, query)2 / Kw2)2. How to fit with the local points?Predict the weighted average of the outputs:predict = Σwiyi / Σwi©2005-2007 Carlos Guestrin13SVMs v. Kernel RegressionSVMsKernel Regressionor©2005-2007 Carlos Guestrin14SVMs v. Kernel RegressionSVMsKernel RegressionorDifferences: SVMs: Learn weights αi (and bandwidth) Often sparse solution KR: Fixed “weights”, learn bandwidth Solution may not be sparse Much simpler to implement©2005-2007 Carlos Guestrin15What’s the difference betweenSVMs and Logistic Regression?High dimensionalfeatures withkernelsLoss functionNoYes!Log-lossHinge lossLogisticRegressionSVMs©2005-2007 Carlos Guestrin16Kernels in logistic regression Define weights in terms of support vectors: Derive simple gradient descent rule on αi©2005-2007 Carlos Guestrin17What’s the difference between SVMsand Logistic Regression? (Revisited)Almost always no!Often yes!Solution sparseYes!Yes!High dimensionalfeatures withkernelsReal probabilities“Margin”Semantics ofoutputLoss function Log-lossHinge lossLogisticRegressionSVMs©2005-2007 Carlos Guestrin18What you need to know Dual SVM formulation How it’s derived (intuition) The kernel trick Derive polynomial kernel Common kernels Kernelized logistic regression Differences between SVMs and logistic regression©2005-2007 Carlos Guestrin19Announcements Class projects out next week©2005-2007 Carlos Guestrin20PAC-learning, VCDimension andMargin-based BoundsMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityFebruary 28th, 2007©2005-2007 Carlos Guestrin21What now… We have explored many ways of learning fromdata But… How good is our classifier, really? How much data do I need to make it “good enough”?©2005-2007 Carlos Guestrin22A simple setting… Classification m data points Finite number of possible hypothesis (e.g., dec. treesof depth d) A learner finds a hypothesis h that is consistentwith training data Gets zero error in training – errortrain(h) = 0 What is the probability that h has more than εtrue error? errortrue(h) ¸ ε©2005-2007 Carlos Guestrin23How likely is a bad hypothesis toget m data points right? Hypothesis h that is consistent with training data !got m i.i.d. points right Prob. h with errortrue(h) ¸ ε gets one data point right Prob. h with errortrue(h) ¸ ε gets m data points right©2005-2007 Carlos Guestrin24But there are many possible hypothesisthat are consistent with training data©2005-2007 Carlos Guestrin25How likely is learner to pick a badhypothesis Prob. h with errortrue(h) ¸ ε gets m data points right There are k hypothesis consistent with data How likely is learner to pick a bad one?©2005-2007 Carlos Guestrin26Union bound P(A or B or C or D or …)©2005-2007 Carlos Guestrin27How likely is learner to pick a badhypothesis Prob. h with errortrue(h) ¸ ε gets m data points right There are k hypothesis consistent with data How likely is learner to pick a bad one?©2005-2007 Carlos Guestrin28Review: Generalization error infinite hypothesis spaces [Haussler ’88] Theorem: Hypothesis space H finite, dataset Dwith m i.i.d. samples, 0 < ε < 1 : for any learnedhypothesis h that is consistent on the training data:©2005-2007 Carlos Guestrin29Using a PAC bound Typically, 2 use cases: 1: Pick ε and δ, give you m 2: Pick m and δ, give you ε©2005-2007 Carlos Guestrin30Review: Generalization error infinite hypothesis spaces [Haussler ’88] Theorem: Hypothesis space H finite, dataset Dwith m i.i.d. samples, 0 < ε < 1 : for any learnedhypothesis h that is consistent on the training data:Even if h makes zero errors in training data, may make errors in test©2005-2007 Carlos Guestrin31Limitations of Haussler ‘88 bound Consistent classifier Size of hypothesis space©2005-2007 Carlos Guestrin32What if our classifier does not havezero error on the training data? A learner with zero training errors may makemistakes in test set What about a learner with errortrain(h) in training set?©2005-2007 Carlos Guestrin33Simpler question: What’s theexpected error of a hypothesis? The error of a hypothesis is like estimating theparameter of a coin! Chernoff bound: for m i.i.d. coin flips, x1,…,xm,where xi 2 {0,1}. For 0<ε<1:©2005-2007 Carlos Guestrin34Using Chernoff bound to estimateerror of a single hypothesis©2005-2007 Carlos Guestrin35But we are comparing manyhypothesis: Union boundFor each hypothesis hi: What if I am comparing two hypothesis, h1 and h2?©2005-2007 Carlos Guestrin36Generalization


View Full Document

CMU CS 10701 - SVMs, Duality and the Kernel Trick (cont.)

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download SVMs, Duality and the Kernel Trick (cont.)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view SVMs, Duality and the Kernel Trick (cont.) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SVMs, Duality and the Kernel Trick (cont.) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?