10 701 15 781 Fall 2006 Final Dec 15 5 30pm 8 30pm There are 9 questions in this exam 15 pages including this cover sheet If you need more room to work out your answer to a question use the back of the page and clearly mark on the front of the page if we are to look at what s on the back This exam is open book and open notes Computers PDAs cell phones are not allowed You have 3 hours Best luck Name Andrew ID Q Topic Max Score 1 Short Questions 20 2 Instance Based Learning 7 3 Computational Learning Theory 9 4 Gaussian Mixture Models 10 5 Bayesian Networks 10 6 Hidden Markov Models 12 7 Dimensionality Reduction 8 8 Graph Theoretic Clustering 8 9 MDPs and Reinforcement Learning 16 Total 100 1 Score 1 Short Questions 20pts 2pts each a True or False The ID3 algorithm is guaranteed to find the optimal decision tree b True or False Consider a continuous probability distribution with density f that is nonzero everywhere The probability of a value x is equal to f x c True or False In a Bayesian network the inference results of the junction tree algorithm are the same as the inference results of variable elimination d True or False If two random variable X and Y are conditionally independent given another random variable Z then in the corresponding Bayesian network the nodes for X and Y are d separated given Z e True or False Besides EM gradient descent can be used to perform inference or learning on a Gaussian mixture model f In one sentence characterize the differences between maximum likelihood and maximum a posteriori approaches g In one sentence characterize the differences between classification and regression h Give one similarity and one difference between feature selection and PCA i Give one similarity and one difference between HMM and MDP j For each of the following datasets is it appropriate to use HMM Provide a brief reasoning for your answer Gene sequence dataset A database of movie reviews eg the IMDB database Stock market price dataset Daily precipitation data from the Northwest of the US 2 Instance Based Learning 7pts 1 Consider the following training set in the 2 dimensional Euclidean space x y Class 1 1 0 1 0 2 1 1 1 0 1 2 2 2 2 3 Figure 1 shows a visualization of the data 4 3 2 y 2 1 0 1 2 2 1 0 1 2 3 x Figure 1 Dataset for Problem 2 a 1pt What is the prediction of the 3 nearest neighbor classifier at the point 1 1 b 1pt What is the prediction of the 5 nearest neighbor classifier at the point 1 1 c 1pt What is the prediction of the 7 nearest neighbor classifier at the point 1 1 3 2 Consider the two class classification problem At a data point x the true conditional probability of a class k k 0 1 is pk x P C k X x a 2pts The Bayes error is the probability that an optimal Bayes classifier will misclassify a randomly drawn example In terms of pk x what is the Bayes error E at x b 2pts In terms of pk x and pk x0 when x0 is the nearest neighbor of x what is the 1 nearest neighbor error E1NN at x Note that asymptotically as the number of training examples grows E E1NN 2E 4 3 Computational Learning Theory 9pts 3pts each In class we discussed different formula to provide a bound on the number of training examples sufficient for successful learning under different learning models 1 m ln 1 ln H 1 m 2 ln 1 ln H 2 1 m 4 log2 2 8V C H log2 13 1 2 3 Pick the appropriate one of the above formula to estimate the number of training examples needed for the following machine learning tasks Briefly explain your choice 1 Consider instances X containing 5 Boolean variables X1 X2 X3 X4 X5 and responses Y are X1 X4 X2 X3 We try to learn the function f X Y using a 2 layered neural network 2 Consider instances X containing 5 Boolean variables X1 X2 X3 X4 X5 and responses Y are X1 X4 X2 X3 We try to learn the function f X Y using a depth 2 decision trees A depth 2 decision tree is a tree with four leaves all distance 2 from the root 3 Consider instances X containing 5 Boolean variables X1 X2 X3 X4 X5 and responses Y are X1 X4 X1 X3 We try to learn the function f X Y using a depth 2 decision trees A depth 2 decision tree is a tree with four leaves all distance 2 from the root 5 4 Gaussian Mixture Model 10pts Consider the labeled training points in Figure 2 where and o denote positive and negative labels respectively Tom asks three students Yifen Fan and Indra to fit Gaussian Mixture Models on this dataset 5 4 X2 3 2 1 0 0 1 2 3 4 5 X1 Figure 2 Dataset for Gaussian Mixture Model 1 4pts Yifen and Fan decide to use one Gaussian distribution for positive examples and one distribution for negative examples The darker ellipse indicates the positive Gaussian distribution contour and the lighter ellipse indicates the negative Gaussian distribution contour 4 4 3 3 X2 5 X2 5 2 2 1 1 0 0 1 2 3 4 0 0 5 X1 1 2 3 4 5 X1 Yifen s model Fan s model Whose model would you prefer for this dataset What causes the difference between these two models 6 2 6pts Indra decides to use two Gaussian distributions for positive examples and two Gaussian distributions for negative examples He uses EM algorithm to iteratively update parameters and also tries different initializations The left column of Figure 3 shows 3 different initializations and the right column shows 3 possible models after the first iteration For each initialization on the left draw an arrow to the model on the right that will result after the first EM iteration Your answer should consist of 3 arrows one from each initialization 4 4 3 3 X2 5 X2 5 2 2 1 1 0 0 1 2 3 4 0 0 5 1 2 X1 3 4 5 3 4 5 3 4 5 X1 4 4 3 3 X2 5 X2 5 2 2 1 1 0 0 1 2 3 4 0 0 5 1 2 X1 X1 4 4 3 3 X2 5 X2 5 2 2 1 1 0 0 1 2 3 4 0 0 5 X1 1 2 X1 a Initialization b After first iteration Figure 3 Three different initializations and models after the first iteration 7 5 Bayesian Networks 10pts The figure below shows a Bayesian network with 9 variables all of which are binary 1 3pts Which of the following statements are always true for this Bayes net a b c d P A B G P A G P B …
View Full Document