# ILLINOIS CS 446 - 100517.2 (7 pages)

Previewing pages*1, 2*of 7 page document

**View the full content.**## 100517.2

Previewing pages *1, 2*
of
actual document.

**View the full content.**View Full Document

## 100517.2

0 0 44 views

- Pages:
- 7
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 12 Kernel Ridge Regression Support Vector Machine Lecturer Sanmi Koyejo Scribe Lan Wang Oct 5th 2017 Agenda Recap Kernels Kernel ridge regression Support vector machines Recall Kernel function We define a kernel function to be a real valued function of two arguments used to measure similarity between any pair of data inputs xi xj R xi xj 7 xi T xj Here Rd Rn is the feature mapping function where d is the dimension of each data input and n is the number of data inputs Usually we have d n Mercer Kernel 2 Mercer kernel is a kernel function that satisfies the requirement that the Gram matrix defined by x1 x2 x1 xn x x x x 2 1 2 n K xn x1 xn xn is positive semi definite for any set of data input xi ni 1 Gaussian Kernel Gaussian kernel which is also called RBF radial basis function kernel is a popular kernel function that is defined as the following xi xj e xi xj 2 2 2 2 1 2 e xi xj 2 2 12 Kernel Ridge Regression Support Vector Machine where 1 2 Notice that by taylor expansion we have 0 2 e x x X 2k x k x0 k k 0 k 2 0 2 e x e x So we can rewrite xi xj as e xi xj 22 X k 0 2 k xi k x0j k k 2 2 e xi 2 e xj 2 1 This implies we have infinite dimensional feature maps 2 k xi e xj 2 k xi where k is a function of xi derived from 1 Ridge Regression Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity i e near linear relationships among the independent variables To understand ridge regression let s first recall the linear regression model that we discussed before n X min yi xTi 2 22 i 1 Let Xn d T x1 xTn then the optimized solution is X T X I 1 X T y Using the fact that I AB 1 A A I BA 1 we can rewrite as X T I XX T 1 y Now we have two ways to get the optimized solution X T X I 1 X T y Computation cost O d3 XX T I 1 y X T Computation cost O n3 Remark Clearly we can see from here that is in the row space of X 12 Kernel Ridge Regression Support Vector Machine 3 When d n using the first method is more efficient Based on the above analysis we have T f xn 1 xn 1 xTn 1 xTn 1 X T n X xTn 1 xi i i 1 Now let s consider ridge regression using feature mappings n min yi T xi 2 22 i 1 Let 2 x1 T n m xn T then similarly we have two ways to get the

View Full Document