6 867 Machine learning Mid term exam October 13 2006 2 points Your name and MIT ID Problem 1 Suppose we are trying to solve an active learning problem where the possible inputs you can select form a discrete set Speci cally we have a set of N unlabeled documents 1 N where each document is represented as a binary feature fector 1 m T and i 1 if word i appears in the document and zero otherwise Our goal is to quickly label these N documents with 0 1 labels We can request a label for any of the N documents preferably as few as possible We also have a small set of n already labeled documents to get us started We use a logistic regression model to solve the classi cation task P y 1 w g wT where g is the logistic function Note that we do not include the bias term 1 T F 2 points Any word that appears in all the N documents would e ectively provide a bias term for the logistic regression model 2 T F 2 points Any word that appears only in the available n la beled documents used for initially training the logistic regression model would serve equally well as a bias term 1 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY 3 Having trained the logistic regression model on the basis of the n labeled documents obtaining w n we d like to request additional labeled documents For this we will use the following measure of uncertainty in our predictions Ey pt y pt pt 1 pt 1 pt 0 pt 2pt 1 pt where pt P y 1 t w n our current prediction of the probability that y 1 for the tth unlabeled document t a 4 points We would request the label for the document query point t that has the smallest value of 2pt 1 pt the largest value of 2pt 1 pt an intermediate value of 2pt 1 pt Brie y explain the rationale behind the selection criterion that you chose b 2 points Sketch w n in Figure 1 1 Write down the equation expressed solely in terms of and w n that has to satisfy for it to lie exactly on the decision boundary 2 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY c 4 points In gure 1 2 circle the next point we would select according to the criterion Draw two decision boundaries that would result from incorporating the new point in the training set labeling the boundaries as y 1 and y 0 depending on the outcome of the query t t1 tm T o o Figure 1 1 Two labeled points unlabeled points and the decision boundary The point corresponds to y 1 Figure 1 2 Two labeled points unlabeled points and the decision boundary The point corresponds to y 1 4 T F 2 points The criterion we have used here for active learning guarantees that the measure of uncertainty about the labels of the unlabeled points will decrease monotonically for each point after each query Problem 2 Consider a regression problem where the two dimensional input points x x1 x2 T are constrained to lie within the unit square xi 1 1 i 1 2 The training and test input points x are sampled uniformly at random within the unit square The target outputs y are governed by the following model y N x31 x52 10x1 x2 7x21 5x2 3 1 In other words the outputs are normally distributed with mean given by x31 x52 10x1 x2 7x21 5x2 3 and variance 1 We learn to predict y given x using linear regression models with 1st through 10th order polynomial features The models are nested in the sense that the higher order models will include all the lower order features The estimation criterion is the mean squared error 3 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY We rst train a 1st 2nd 8th and 10th order model using n 20 training points and then test the predictions on a large number of independently sampled points 1 6 points Select all the appropriate model s for each column If you think the highest or lowest error would be shared among several models be sure to list all models Lowest test error Lowest training error Highest training error typically 1st order 2nd order 8th order 10th order Brie y explain your selection in the last column i e the model you would expect to have the lowest test error 2 6 points We now train the polynomial regression models using n 106 one million training points Again select the appropriate model s for each column If you think the highest or lowest error would be shared among several models be sure to list all models Lowest structural error Highest approx error Lowest test error 1st order 2nd order 8th order 10th order 3 T F 2 points The approximation error of a polynomial regression model depends on the number of training points 4 T F 2 points The structural error of a polynomial regression model depends on the number of training points 4 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY Problem 3 We consider here linear and non linear support vector machines SVM of the form min w12 2 subject to yi w1 xi w0 1 0 i 1 n or min wT w 2 subject to yi wT i w0 1 0 i 1 n where i is a feature vector constructed from the corresponding real valued input xi We wish to compare the simple linear SVM classi er w1 x w0 and the non linear classi er wT w0 where x x2 T 1 3 points Provide three input points x1 x2 and x3 and their associated 1 labels such that they cannot be separated with the simple linear classi er but are separable by the non linear classifer with x x2 T You may nd Figure 3 1 helpful in answering this question 2 3 points In the gure below Figure 3 1 mark your three points x1 x2 and x3 as points in the feature space with their associated labels Draw the decision boundary of the non linear SVM classi er with x x2 T that separates the points 16 14 12 2 x2 10 x x2 8 6 4 2 0 0 1 2 1 x 3 4 Figure 3 1 Feature space 5 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY 3 3 points Consider two labeled points x …
View Full Document
Unlocking...