ILLINOIS CS 446 - 100517.2 (7 pages)

Previewing pages 1, 2 of 7 page document View the full content.
View Full Document

100517.2



Previewing pages 1, 2 of actual document.

View the full content.
View Full Document
View Full Document

100517.2

36 views


Pages:
7
School:
University of Illinois - urbana
Course:
Cs 446 - Machine Learning
Machine Learning Documents

Unformatted text preview:

CS446 Machine Learning Fall 2017 Lecture 12 Kernel Ridge Regression Support Vector Machine Lecturer Sanmi Koyejo Scribe Lan Wang Oct 5th 2017 Agenda Recap Kernels Kernel ridge regression Support vector machines Recall Kernel function We define a kernel function to be a real valued function of two arguments used to measure similarity between any pair of data inputs xi xj R xi xj 7 xi T xj Here Rd Rn is the feature mapping function where d is the dimension of each data input and n is the number of data inputs Usually we have d n Mercer Kernel 2 Mercer kernel is a kernel function that satisfies the requirement that the Gram matrix defined by x1 x2 x1 xn x x x x 2 1 2 n K xn x1 xn xn is positive semi definite for any set of data input xi ni 1 Gaussian Kernel Gaussian kernel which is also called RBF radial basis function kernel is a popular kernel function that is defined as the following xi xj e xi xj 2 2 2 2 1 2 e xi xj 2 2 12 Kernel Ridge Regression Support Vector Machine where 1 2 Notice that by taylor expansion we have 0 2 e x x X 2k x k x0 k k 0 k 2 0 2 e x e x So we can rewrite xi xj as e xi xj 22 X k 0 2 k xi k x0j k k 2 2 e xi 2 e xj 2 1 This implies we have infinite dimensional feature maps 2 k xi e xj 2 k xi where k is a function of xi derived from 1 Ridge Regression Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity i e near linear relationships among the independent variables To understand ridge regression let s first recall the linear regression model that we discussed before n X min yi xTi 2 22 i 1 Let Xn d T x1 xTn then the optimized solution is X T X I 1 X T y Using the fact that I AB 1 A A I BA 1 we can rewrite as X T I XX T 1 y Now we have two ways to get the optimized solution X T X I 1 X T y Computation cost O d3 XX T I 1 y X T Computation cost O n3 Remark Clearly we can see from here that is in the row space of X 12 Kernel Ridge Regression Support Vector Machine 3 When d n using the first method



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view 100517.2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 100517.2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?