DOC PREVIEW
MIT 6 867 - Midterm exam

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6 867 Machine learning Mid term exam October 13 2004 2 points Your name and MIT ID Problem 1 1 0 1 1 0 x 1 1 noise noise noise 1 0 1 1 A 0 x 1 0 1 1 0 x 1 C B 1 6 points Each plot above claims to represent prediction errors as a function of x for a trained regression model based on some dataset Some of these plots could potentially be prediction errors for linear or quadratic regression models while oth ers couldn t The regression models are trained with the least squares estimation criterion Please indicate compatible models and plots linear regression quadratic regression A B C 1 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY Problem 2 Here we explore a regression model where the noise variance is a function of the input variance increases as a function of input Speci cally y wx where the noise is normally distributed with mean 0 and standard deviation x The value of is assumed known and the input x is restricted to the interval 1 4 We can write the model more compactly as y N wx 2 x2 If we let x vary within 1 4 and sample outputs y from this model with some w the regression plot might look like 10 8 y 6 4 2 0 1 2 3 4 x 1 2 points How is the ratio y x distributed for a xed constant x 2 Suppose we now have n training points and targets x1 y1 x2 y2 xn yn where each xi is chosen at random from 1 4 and the corresponding yi is subsequently sampled from yi N w xi 2 x2i with some true underlying parameter value w the value of 2 is the same as in our model 2 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY a 3 points What is the maximum likelihood estimate of w as a function of the training data b 3 points What is the variance of this estimator due to the noise in the target outputs as a function of n and 2 for xed inputs x1 xn For later utility if you omit this answer you can denote the answer as V n 2 Some potentially useful relations if z N 2 then az N a 2 a2 for a xed a If z1 N 1 12 and z2 N 2 22 and they are independent then Var z1 z2 12 22 3 In sequential active learning we are free to choose the next training input xn 1 here within 1 4 for which we will then receive the corresponding noisy target yn 1 sam pled from the underlying model Suppose we already have x1 y1 x2 y2 xn yn and are trying to gure out which xn 1 to select The goal is to choose xn 1 so as to help minimize the variance of the predictions f x w n w n x where w n is the maxi mum likelihood estimate of the parameter w based on the rst n training examples a 2 points What is the variance of f x w n due to the noise in the training out puts as a function of x n and 2 given xed already chosen inputs x1 xn b 2 points Which xn 1 would we choose within 1 4 if we were to next select x with the maximum variance of f x w n c T F 2 points Since the variance of f x w n only depends on x n and 2 we could equally well select the next point at random from 1 4 and obtain the same reduction in the maximum variance 3 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY 1 0 9 0 8 2 P y 1 x w 0 7 0 6 0 5 0 4 1 P y 1 x w 0 3 0 2 0 1 0 2 y 0 1 5 1 y 1 0 5 0 y 0 0 5 1 1 5 2 Figure 1 Two possible logistic regression solutions for the three labeled points Problem 3 Consider a simple one dimensional logistic regression model P y 1 x w g w0 w1 x where g z 1 exp z 1 is the logistic function 1 Figure 3 shows two possible conditional distributions P y 1 x w viewed as a function of x that we can get by changing the parameters w a 2 points Please indicate the number of classi cation errors for each condi tional given the labeled examples in the same gure Conditional 1 makes classi cation errors Conditional 2 makes classi cation errors b 3 points One of the conditionals in Figure 3 corresponds to the maximum likelihood setting of the parameters w based on the labeled data in the gure Which one is the ML solution 1 or 2 c 2 points Would adding a regularization penalty w1 2 2 to the loglikelihood estimation criterion a ect your choice of solution Y N 4 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY expected log likelihood of test labels 1 0 5 0 0 5 1 1 5 0 50 100 150 200 number of training examples 250 300 Figure 2 The expected log likelihood of test labels as a function of the number of training examples 2 4 points We can estimate the logistic regression parameters more accurately with more training data Figure 2 shows the expected log likelihood of test labels for a simple logistic regression model as a function of the number of training examples and labels Mark in the gure the structural error SE and approximation error AE where error is measured in terms of log likelihood 3 T F 2 points In general for small training sets we are likely to reduce the approximation error by adding a regularization penalty w1 2 2 to the log likelihood criterion 5 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY x2 0 1 o 1 1 0 0 1 0 x x o x1 Figure 3 Equally likely input con gurations in the training set Problem 4 Here we will look at methods for selecting input features for a logistic regression model P y 1 x w g w0 w1 x1 w2 x2 The available training examples are very simple involving only binary valued inputs Number of copies 10 10 10 10 x1 1 0 1 0 x2 1 1 0 0 y 1 0 0 1 So for example …


View Full Document

MIT 6 867 - Midterm exam

Loading Unlocking...
Login

Join to view Midterm exam and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Midterm exam and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?