## 091417.3

Previewing page
*1*
of
actual document.

**View the full content.**View Full Document

## 091417.3

0 0 43 views

- Pages:
- 3
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 6 Penalized maximum likelihood Sparsity Forward Backward Lecturer Sanmi Koyejo Scribe Yuqi Zhang Sep 21st 2017 Recap Naive Bayes The basic idea is P x y N Y P xi y i 1 Once we have above function we can compute that P y x ispropositionaltoP x y P y Maximum likelihood The idea is that we want to maximize dataset D with the best P Dn There are several ways to do this log likelihood l log P Dn We can use one of the following two major methods can be used to computer argmin l dl 0 d Gradient Descent Initial set 0 then recursively update t 1 t t 1 d l d 2 6 Penalized maximum likelihood Sparsity Forward Backward selection Linear Regression Creating a model P y x N m x 2 and m x wT x where w Rd and 2 R Therefore we have w 2 the log likelihood function is l n X log i 1 n X i 1 1 1 1 2 exp 2 yi wT xi 2 2 2 2 1 1 yi wT xi 2 log 2 2 2 2 2 1 n RSS w 2 2 2 2 2 for function above RSS represents residual sum of equations which is the sum of the squares of residuals deviations predicted from actual empirical values of data It is a measure of the discrepancy between the data and an estimation model minw 2 l min 2 minw RSS w minw 1 N 2 2 2 minw RSS w 2 2 n X yi wT xi 2 whereyi R wT xi Rd i 1 y1 xT1 y xT 2 We rewrite above function by making vector y Rn vector x Rn d 2 yn xTn RSS w 1 y xw 22 2 1 y xw T y xw 2 1 wT xT x w wT xT y const 2 dRSS w xT xw xT y 0 dw xT xw xT y w xT x 1 xT y 6 Penalized maximum likelihood Sparsity Forward Backward selection 3 Penalized Model It is not always a good idea to take data and fit the model Failure case of Lease Squares w xT x 1 xy for above function if xT x is singular then the inverse will fail and if n d then xT x will always be singular probabilistic view The goal is to include the prior distribution of the model parameter Common priors are gaussian p w 0 2 Symbols MAP maximum a posteriori argmax p Dn p p Dn p p Dn 2 1 N w 2 1 g 2 RSS W log 2r 2 log 2r 2 2 2 2 2 2 2 RSS w 2 w 22 2 y xw 22 2 w 22 max p Dn max

View Full Document