CMU CS 10701 - Homework - D1905396

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Homework

DOC PREVIEW

CMU CS 10701 - Homework

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

10 701 Machine Learning Spring 2011 Homework 3 Due Feb 17th at the beginning of class Instructions There are 2 questions on this assignment The last question involves coding Please submit your writeup as 2 separate sets of pages according to TAs with your name and userid on each set 1 Bayes Net Carl 40 points Figure 1 Graphical model for question 1 1 1 In this problem you will study the behavior of the simple graphical model shown in figure 1 often called a collider Here the probability of Z depends on two variables X and Y a Write the factored distribution represented by this graph That is write the distribution as a product of factors each of which could be replaced by entries in the conditional probability tables CPTs in order to obtain the probability of an event 4 points b Assume that X Y and Z are binary Write the CPTs for a distribution such that P X 1 P X 1 Y 1 Z 1 P X 1 Z 1 Make sure to include in your solution the values of P X 1 Y 1 Z 1 and P X 1 Z 1 8 points Figure 2 Graphical model for question 1 2 2 Let P A B C D E F G H factorize according to the graph given in figure 2 Prove or disprove whether the following independence properties hold 3 points each a A H b B G c B G E 1 d C D G A e C D f B D C Figure 3 Graphical model for question 1 3 3 Let X X1 Xn have a distribution P X that factorizes according to the graph given in figure 3 and assume each Xi is binary Design a polynomial time algorithm which will compute the following quantities Use pseudocode for your answer In your pseudocode assume that the variables px1 and px already exist where px1 is the prior probability P X1 1 and px is an n 1 by 2 dimensional array where px i j is P Xi 1 1 Xi j 1 for j 1 2 For this problem you may assume that the numbers have arbitrary precision so you do not need to worry about underflow a Compute P Xn 1 X1 x1 Assume that x1 has already been initialized with the value of X1 0 1 5 points b Compute P X1 1 Xn xn Assume that xn has already been initialized with the value of Xn 0 1 5 points 2 Multi class Logistic Regression with Applications to Handwritten Digit Recognition Xi 60 points Recall the multi class logistic regression model on page 6 of the lecture on Jan 27th Assume that we have K different classes and each input x is a d dimensional vector The posterior probability is given by P Y k X x P Y K X x exp wkT x for k 1 K 1 PK 1 1 l 1 exp wlT x 1 PK 1 1 l 1 exp wlT x where wkT denotes the transpose of the wk vector and wkT x is the inner product of wk and x i e wkT x P i wki xi To simplify we ignore the constant term wk0 in the logistic regression model For ease of notation we can replace the two above expressions for P Y X by a single expression We do this by introducing a fixed pseudo parameter vector wK 0 Now for any class label k we simply write P Y k X x exp wkT x PK 1 1 l 1 exp wlT x for k 1 K 1 How many parameters do we need to estimate What are these parameters 5pt 2 Given n training samples x1 y1 x2 y2 xn yn please write down explicitly the log likelihood function and simplify it as much as you can L w1 wK 1 n X ln P Y yi X xi i 1 5pt 3 Compute the gradient of L with respect to each wk and simply it 5pt 2 4 Now we add the regularization term 2 PK 1 l 1 kwl k22 and define a new objective function f w1 wK 1 L w1 wK 1 K 1 X kwl k22 2 l 1 Compute the gradient of f with respect to each wk 5pt 5 In this part you are going to play with the handwritten digit recognition problem on the USPS digit dataset Each hand written digital image is 16 by 16 pixels If we treat the value of each pixel as a boolean feature either 0 for black or 1 for white then each example has 16 16 256 0 1 valued features and hence x has 256 dimension Each digit i e 1 2 3 4 5 6 7 8 9 0 corresponds to a class label y y 1 K K 10 For each digit we have 600 training samples and 500 testing samples You can view these images at http www cs nyu edu roweis data usps 0 jpg http www cs nyu edu roweis data usps 9 jpg Please download the data from the website Load the usps digital mat file in usps digital zip into MATLAB You will have four matrices tr X training input matrix with the dimension 6000 256 tr y training label of the length 6000 each element is from 1 to 10 te X testing input matrix with the dimension 5000 256 te y testing label of the length 5000 each element is from 1 to 10 For those who do NOT want to use MATLAB we also provide the text file for these four matrices in usps digital zip Note that if you want to view the image of a particular training testing example in MATLAB say the 1000th training example you may use the MATLAB command imshow reshape tr X 1000 16 16 a 15pt Use the gradient ascent algorithm to train a multi class logistic regression classifier Plot 1 the objective value log likelihood 2 the training accuracy and 3 the testing accuracy versus the number of iterations Report your final testing accuracy i e the fraction of test images that are correctly classified Note that you must choose a suitable learning rate i e stepsize of the gradient ascent algorithm A hint is that your learning rate cannot be too large otherwise your objective will increase only for the first few iterations In addition you need to choose a suitable stopping criterion You might use the number of iterations the decrease of the objective value or the maximum of the L2 norms of the gradient with respect to each wk Or you might watch the increase of the testing accuracy and stop the optimization when the accuracy is stable PK 1 b 20pt Now we add the regularization term 2 l 1 kwl k22 For 1 10 100 1000 report the final testing accuracies c 5pt What can you conclude from the above experiment hint the relationship between the regularization weight and the prediction performance 3

View Full Document

CMU CS 10701 - Homework

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

CMU CS 10701 - Homework

Sign up for free to view:

Please select your school