UCI ICS 273A - HOMEWORK - ICS 273A - D2782991

Home> Schools> University of California, Irvine> (ICS) > ICS 273A> HOMEWORK - ICS 273A

DOC PREVIEW

UCI ICS 273A - HOMEWORK - ICS 273A

School name University of California, Irvine

Course Ics 273a- Machine Learning

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

ICS273A: HW2Due: Feb 4Problem I: Weighted linear regression. Let the cost function for weightedlinear regression beJ(θ) =12mXi=1w(i)(y(i)− θTx(i))2.A Write the cost function using matrix notation, where θ is the linear model,X is the design matrix, ¯y is a m-length vector of y(i)values, and W is adiagonal matrix. State what the diagonal entries of W are.B Derive a closed form solution to the above cost function using a weightedform of the normal equations (recall the original equation was XT(¯y −Xθ) = 0, with a solution of θ = (XTX)−1XT¯y).C Suppose we have a training set of m training pairs {(x(i), y(i)) : i =1 . . . m}, but each label y(i)was observed with a fixed, known varianceσ(i). We can writep(y(i)|x(i), θ) =1√2πσ(i)exp−(y(i)− θTx(i))22(σ(i))2Show that the above cost function finds the θ that maximizes the loglikelihood of this probabilistic model. Describe how σ(i)relates to w(i).Problem II: Exponential family & generalized linear models. A probabilitydistribution in the exponential family takes on the following form:p(x|η) = h(x) exp{ηTT (x) − A(η)}= h(x) exp{XiηiTi(x) − A(η)}where h is called the reference function, T is the sufficient statistic, η is thenatural parameter, and A(η) is a normalization constant.1A Show that the following distributions are in the exponential family, defin-ing the T, A, and h functions in each case. Make sure to write A(η) as afunction of the natural parameters η.1. A unit variance gaussian random parameterized by µ. Note: you cando this using a one dimensional parameter vector η.p(x|µ) =1√2πexp−(x − µ)222. A poission random variable parameterized by λp(x|λ) =λxe−λx!3. A multinomial random variable, which can be thought of as the outcomeof a biased k-sided die, where θiis the probability of rolling an i. Writethe roll x as k-length vector of all zeros with a single one for the rolledindex i.p(x|θ) = θx11θx22. . . θxkkUse the fact that θk= 1 −Pk−1i=1θi, and similarly xk= 1 −Pk−1i=1xi.Express the distribution as a (k − 1) dimensional parameter vector η.B The function A(η) can be used to calculate moments of the random vari-able x. Show the following equations holdA(η) = logZh(x) exp{ηTT (x)}dx (1)∂A∂ηi= E[Ti(x)] (2)∂2A∂ηi∂ηj= cov(Ti(x), Tj(x)) (3)Hints: For (1), use the fact thatRp(x|η)dx = 1. For (2), assume thederivative operator can pass inside the integral from (1) and use thefact that exp A(η) =Rh(x) exp{ηTT (x)}dx. For (3), recall cov(A, B) =E[AB] − E[A][B].Sanity Check for Part A: You can apply Eq (2) to the three distributionsfrom part (A) to calculate the means. If you have correctly defined A(η),you should recover the original parameters µ, λ, and θ.2C Generalized linear models (GLM). Assume we want to predict y givenx using a probability distribution from the exponential family. For sim-plicitly, assume both x and the canonical parameters η are one dimen-sional, and the sufficient statistics T (y) = y. We can write p(y|η) =h(y) exp(ηy − A(η)). We can write the log-liklihood for a set of trainingpoints asl(θ) =Xilog p(y(i)|η(i))where η = θx for a GLM. Show that∂2l∂θ∂θ≤ 0Hint: Use both results from Part B, and the fact that the variance of arandom variable must always be positive. This proves that, in the onedimensional case, GLMs (including linear and logistic regression) are con-cave (or alternatively, the negative log likelihood is convex). This alsoholds in the vector-valued case.Problem III K-way linear discriminant analysis. Assume we are given atraining set of examples where x(i)∈ Rnand y(i)∈ {1 . . . k}. We will model yas multinomial random variable, and will model x conditioned on y as a multi-variate guassian with a single covariance matrix Σ for all k classes.p(y|θ) = θy11θy22. . . θykk(4)p(x|yi= 1) =1(2π)(n/2)|Σ|1/2exp −12(x − µi)TΣ−1(x − µi) (5)A Show that the posterior for y can be written as the soft-max functionbelow. You may have to redefine x to include a constant term x0= 1.p(y = i|x, θ, Σ, {µj}) =exp(wTix)Pjexp(wTjx)B Show that for k = 2, the posterior takes the form of a logistic functionp(y = 1|x, θ, Σ, µ1, µ2) =11 + exp(−wTx)Problem IV [MATLAB] Logistic regression. Download hw2train.dat fromthe website. Each datapoint consists of a two-dimensional input feature and anassociated binary label.A Plot the data, using 0s and Xs for the two classes. The plots in thefollowing parts should be plotted on top of this. You can use the commandhold on to do this.3B Fit a generative model to the data, using Gaussian class-conditional den-sities with equal covariance matrices. Calculate the posterior probabilityof class 1, and plot the line where the probability is equal to 0.5C Write a program to fit a logistic regression model using the IRLS algorithm(remember to include the intercept term). Recall this method uses theNewton-Raphson method to optimize the log likelihood of the trainingdata. Plot the line where the logistic function is equal to 0.5.D Wite a program to fit a logistic regression model using stochastic gradientascent. Plot the line where the logistic function is equal to 0.5.E Fit a linear regression to the problem, treating the class labels are realvalues 0 and 1. Use the normal equations to solve for the linear predictor.Plot the line where the predictor function is equal to 0.5.F The data set hw2test.dat is a separate data set generated from the samesource. Test your fits from parts (b), (c), (d), and (e) by computing thefraction of miss-classified test

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

UCI ICS 273A - HOMEWORK - ICS 273A

Sign up for free to view:

Please select your school