ILLINOIS CS 446 - 091417.3 (3 pages)

Previewing page 1 of 3 page document
View Full Document

091417.3

Previewing page 1 of actual document.

View Full Document
View Full Document

091417.3

51 views

Pages:
3
School:
University of Illinois - urbana
Course:
Cs 446 - Machine Learning
Machine Learning Documents
• 8 pages

• 7 pages

• 7 pages

• 6 pages

• 9 pages

• 7 pages

• 8 pages

• 5 pages

• 10 pages

• 6 pages

• 9 pages

• 11 pages

• 7 pages

• 11 pages

• 7 pages

• 9 pages

• 7 pages

• 7 pages

• 9 pages

• 7 pages

• 7 pages

• 7 pages

• 9 pages

• 8 pages

• 9 pages

• 8 pages

• 6 pages

• 6 pages

• 7 pages

• 11 pages

• 4 pages

• 5 pages

• 7 pages

• 5 pages

• 7 pages

• 4 pages

Unformatted text preview:

CS446 Machine Learning Fall 2017 Lecture 6 Penalized maximum likelihood Sparsity Forward Backward Lecturer Sanmi Koyejo Scribe Yuqi Zhang Sep 21st 2017 Recap Naive Bayes The basic idea is P x y N Y P xi y i 1 Once we have above function we can compute that P y x ispropositionaltoP x y P y Maximum likelihood The idea is that we want to maximize dataset D with the best P Dn There are several ways to do this log likelihood l log P Dn We can use one of the following two major methods can be used to computer argmin l dl 0 d Gradient Descent Initial set 0 then recursively update t 1 t t 1 d l d 2 6 Penalized maximum likelihood Sparsity Forward Backward selection Linear Regression Creating a model P y x N m x 2 and m x wT x where w Rd and 2 R Therefore we have w 2 the log likelihood function is l n X log i 1 n X i 1 1 1 1 2 exp 2 yi wT xi 2 2 2 2 1 1 yi wT xi 2 log 2 2 2 2 2 1 n RSS w 2 2 2 2 2 for function above RSS represents residual sum of equations which is the sum of the squares of residuals deviations predicted from actual empirical values of data It is a measure of the discrepancy between the data and an estimation model minw 2 l min 2 minw RSS w minw 1 N 2 2 2 minw RSS w 2 2 n X yi wT xi 2 whereyi R wT xi Rd i 1 y1 xT1 y xT 2 We rewrite above function by making vector y Rn vector x Rn d 2 yn xTn RSS w 1 y xw 22 2 1 y xw T y xw 2 1 wT xT x w wT xT y const 2 dRSS w xT xw xT y 0 dw xT xw xT y w xT x 1 xT y 6 Penalized maximum likelihood Sparsity Forward Backward selection 3 Penalized Model It is not always a good idea to take data and fit the model Failure case of Lease Squares w xT x 1 xy for above function if xT x is singular then the inverse will fail and if n d then xT x will always be singular probabilistic view The goal is to include the prior distribution of the model parameter Common priors are gaussian p w 0 2 Symbols MAP maximum a posteriori argmax p Dn p p Dn p p Dn 2 1 N w 2 1 g 2 RSS W log 2r 2 log 2r 2 2 2 2 2 2 2 RSS w 2 w 22 2 y xw 22 2 w 22 max p Dn max

View Full Document

Unlocking...