ILLINOIS CS 446 - 091417.3 - D3425960

Home> Schools> University of Illinois - urbana> (CS) > CS 446> 091417.3

DOC PREVIEW

ILLINOIS CS 446 - 091417.3

School name University of Illinois - urbana

Course Cs 446- Machine Learning

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS446: Machine Learning, Fall 2017Lecture 6 : Penalized maximum likelihood: Sparsity: Forward / Backward selectionLecturer: Sanmi Koyejo Scribe: Yuqi Zhang, Sep. 21st, 2017RecapNaive BayesThe basic idea isP (x | y) =NYi=1P (xi| y)Once we have above function, we can compute thatP (y | x)ispropositionaltoP (x | y)P (y)Maximum likelihoodThe idea is that we want to maximize dataset D with the best θP (Dn| θ)There are several ways to do this• log likelihoodl(θ) = log P (Dn| θ)We can use one of the following two major methods can be used to computer argminθl(θ)dl(θ)dθ= 0• Gradient Descent Initial set θ0then recursively updateθt+1= θt+ γtd(−l(θ))dθ12 6 : Penalized maximum likelihood: Sparsity: Forward / Backward selectionLinear RegressionCreating a model P (y | x) N (m(x), σ2) and m(x) = wTx, where w ∈ Rdand θ2∈ R+Therefore, we have θ = {w, σ2}the log likelihood function is:l(θ) =nXi=1log[(12πσ2)12exp(−12σ2(yi− wTxi)2)]=nXi=1−12σ2(yi− wTxi)2−12log(2σ2π)=12σ2RSS(w) −n2(2πσ2)for function above, RSS represents residual sum of equations, which is the sum of the squaresof residuals (deviations predicted from actual empirical values of data). It is a measure ofthe discrepancy between the data and an estimation model.minw,σ2= −l(σ) = minσ2[−N2(2πσ2) +12σ2minwRSS(w)]minwRSS(w) = minwnXi=1(yi− wTxi)2whereyi∈ R, wT, xi∈ RdWe rewrite above function by making vector y ∈ Rn=y1y2. .ynvector x ∈ Rn×d=xT1xT2. . .xTnRSS(w) =12|| y − xw||22=12(y − xw)T(y − xw)=12wT(xTx)w − wT(xTy) + constdRSS(w)dw= xTxw − xTy = 0xTxw = xTyw = (xTx)−1xTy6 : Penalized maximum likelihood: Sparsity: Forward / Backward selection 3Penalized ModelIt is not always a good idea to take data and fit the modelFailure case of Lease Squares:w = (xTx)−1xyfor above function, ifxTxis singular, then the inverse will fail and if n ¡ d, thenxTxwillalways be singularprobabilistic viewThe goal is to include the prior distribution of the model parameter.Common priors are gaussianp(θ) = w(0, Λ2)Symbols(MAP - maximum a posteriori)θ = argmaxθp(Dn| θ)p(θ)maxθp(θ | Dn) = maxθp(Dn| θ)p(θ)p(Dn)g(θ) =−12σ2RSS(W ) −N2log(2rσ2) −|| w ||222λ2+12log(2rλ2)= −RSS(w) −σ2λ2|| w ||22=|| y − xw||22+σ2λ2|| w

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

ILLINOIS CS 446 - 091417.3

Sign up for free to view:

Please select your school