ILLINOIS CS 446 - 100517.1 (9 pages)

Previewing pages 1, 2, 3 of 9 page document View the full content.
View Full Document

100517.1



Previewing pages 1, 2, 3 of actual document.

View the full content.
View Full Document
View Full Document

100517.1

45 views


Pages:
9
School:
University of Illinois - urbana
Course:
Cs 446 - Machine Learning

Unformatted text preview:

CS446 Machine Learning Fall 2017 Lecture 12 Gaussian Process Regression Lecturer Sanmi Koyejo Scribe Gohar Irfan Chaudhry Oct 5th 2017 Recap Kernel Function Similarity between xi and xj k xxx R We can derive kernel by feature matching as follows k xi xj T xi xj where Rd Rm usually m d Anything that can be written in this way is called a Mercer Kernel D1 x1 xn k x1 x1 k x1 x2 k x1 xn k xn x1 k xn x2 k xn xn We observe that is positive semi definite meaning that it has positive eigenvalues An example of this is Gaussian Kernel which is also called a Radial Basis Function or RBF Kernel and is one of the most popular kernels in practice 1 2 12 Gaussian Process Regression kxi xj k22 2 2 exp kxi xj k22 k xi xj exp where 1 2 2 is a hyperparameter known as a bandwidth parameter that is generally tuned using cross validation When its value is small you are much more sensitive to nearby points X xTi xj w 1 1 1 2 exp kxi xj k2 exp kxi k22 exp kxj k22 2 w 2 2 w 0 Kernels are convenient when doing complex feature matching 1 w xi exp kxi k22 w 2 Ridge Regression min x n X yi xTi w 2 kwk22 i 1 xT1 xT 2 X xTn X is an n x d matrix w X T X I 1 X T y Fact to note I AB 1 A A I BA 1 W X T I XX T 1 y n X T X i xi i 1 12 Gaussian Process Regression 3 where I XX T 1 y W is in the row space of X so we can write W as a weighted sum of X Two Solutions First w X T X I 1 X T y Dominant computation cost is the inverse which is of size d x d so O d3 Second XX T I 1 y w XT Cost is O n3 So we use the first solution when d n when lots of samples and few dimensions otherwise we use the second solution when d n when lots of dimensions and a few samples f xn 1 wT xn 1 xTn 1 w plugging in the solution from above for w xTn 1 X T n X xTn 1 xi i i 1 This gives us prediction for some new x 4 12 Gaussian Process Regression Use feature mappings n X yi wT T xi 2 kwk22 i 1 T x1 T x 2 T xn The first solution as before w T I 1 T y The second solution as before T I 1 y w T f x wT x n X xn 1 T xi i i 1 T x1 T x T 2 T x1 T xn T xn T x1 x1 T x1 xn T xn x1



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view 100517.1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 100517.1 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?