DOC PREVIEW
CORNELL CS 472 - Study Notes

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Foundations of Artificial IntelligenceSupport Vector Machines and KernelsCS472 – Fall 2007Thorsten JoachimsOutline• Transform a linear learner into a non-linear learner• Kernels can make high-dimensional spaces tractable• Kernels can make non-vectorial data tractableNon-Linear ProblemsProblem:• some tasks have non-linear structure• no hyperplane is sufficiently accurateHow can SVMs learn non-linear classification rules?ÎExtending the Hypothesis SpaceIdea: add more featuresÎ Learn linear rule in feature space.Example:Î The separating hyperplane in feature space is degreetwo polynomial in input space.Example• Input Space: (2 attributes)• Feature Space:(6 attributes) Dual (Batch) Perceptron Algorithm2Dual SVM Optimization Problem• Primal Optimization Problem• Dual Optimization Problem• Theorem: If w* is the solution of the Primal and α* is the solution of the Dual, thenKernelsProblem: Very many Parameters! Polynomials of degree p over N attributes in input space lead to attributes in feature space!Solution: [Boser et al.] The dual OP depends only on inner products => Kernel FunctionsExample: For calculating computes inner product in feature space.Î no need to represent feature space explicitly.SVM with KernelTraining:Classification:New hypotheses spaces through new Kernels:• Linear:• Polynomial:• Radial Basis Function:• Sigmoid:Examples of KernelsPolynomial Radial Basis FunctionKernels for Non-Vectorial Data• Applications with Non-Vectorial Input Data Æ classify non-vectorial objects– Protein classification (x is string of amino acids)– Drug activity prediction (x is molecule structure)– Information extraction (x is sentence of words)–Etc.• Applications with Non-Vectorial Output DataÆ predict non-vectorial objects– Natural Language Parsing (y is parse tree)– Noun-Phrase Co-reference Resolution (y is clustering)– Search engines (y is ranking)Î Kernels can compute inner products efficiently!Properties of SVMs with Kernels• Expressiveness– Can represent any boolean function (for appropriate choice of kernel)– Can represent any sufficiently “smooth” function to arbitrary accuracy (for appropriate choice of kernel)• Computational– Objective function has no local optima (only one global)– Independent of dimensionality of feature space• Design decisions– Kernel type and parameters– Value of


View Full Document
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?