CS 6375 Fall 2018 Final Exam 12 12 2018 Name Print This exam contains 15 pages including this cover page and 6 problems Check to see if any pages are missing Enter all requested information on the top of this page and put your initials on the top of every page in case the pages become separated You may NOT use books notes or any electronic devices on this exam Examinees found to be using any materials other than a pen or pencil will receive a zero on the exam and face possible disciplinary action The following rules apply Organize your work in a reasonably neat and coherent way in the space provided Work scat tered all over the page without a clear ordering will receive very little credit To ensure maximum credit on short answer algorithmic questions be sure to EXPLAIN your solution Problems subproblems are not ordered by dif culty Do not write in the table to the right Problem Points Score 1 2 3 4 5 6 24 6 19 15 16 20 Total 100 CS 6375 Final Exam Initials 1 True or False and Explain For each of the following statements indicate whether or not they are true or false and explain your reasoning Simply writing true or false without correct reasoning will receive no credit a 4 points Suppose that you are tting a Gaussian distribution to data MAP estimation and maximum likelihood estimation result in the same parameters when the number of independently sampled data observations goes to in nity independent of the choice of prior b 4 points Consider two random variables X Y 0 1 representing independent coin ips of possibly biased coins The only nonzero entries of the empirical covariance matrix of X and Y are the diagonal entries which must be strictly larger than zero c 4 points If a given hypothesis space has VC dimension d then there exists a set of d points such that any active learner requires d label queries in the worst case in order to produce a perfect classi er for these d data points Page 2 of 15 CS 6375 Final Exam Initials True and False continued d 4 points A function f R R is quasiconvex if f x 1 y max f x f y for all x y R and all 0 1 Every convex function g R R is also quasiconvex e 4 points If actions are only selected greedily in Q learning the Q learning strategy can not converge to the optimal value function Page 3 of 15 CS 6375 Final Exam Initials True and False continued f 4 points Consider a feedforward neural network structure with 10 binary inputs one binary output and a single layer of 20 hidden units Further suppose that the activation function on the hidden units are relu s and the activation on the output is a perceptron Let the hypothesis space H consist of every neural network that can be obtained from this structure for each di erent choice of the weights and biases The VC dimension of H is at most 21 Page 4 of 15 CS 6375 Final Exam Initials 2 Gaussian Mixtures Consider tting a K component Gaussian mixture to data points a 3 points Is there a unique setting of the mixture weights that maximizes the likelihood x 1 x M Rn Explain b 3 points A random initialization for the EM algorithm for Gaussian mixtures might result in a poor local optimum How might you initialize the EM algorithm in practice to help avoid this Page 5 of 15 CS 6375 Final Exam Initials 3 Laplace Maximum Likelihood The Laplace distribution is a probability distribution over the real numbers de ned by two parameters b R such that b 0 with probability density p x b 1 cid 17 cid 16 2b exp x b a 6 points Given data samples x 1 x M R compute the maximum likelihood esti mator for and b Hint use d dx x 1 x 0 x 0 0 x 0 1 Page 6 of 15 CS 6375 Final Exam Initials Laplace Maximum Likelihood continued b 8 points Consider adding a prior probability distribution p exp for some integer i What is the MAP estimate of under this prior for 1 ii How does the choice of a ect the MAP estimate of How should you pick in practice Page 7 of 15 CS 6375 Final Exam Initials Laplace Maximum Likelihood continued c 5 points Is the maximum likelihood estimator for unbiased Hint the Laplace distri bution is symmetric about Page 8 of 15 CS 6375 Final Exam Initials 4 Tic Tac Toe Suppose that you wanted to design a strategy for playing tic tac toe versus an adversary using reinforcement learning a 10 points Explain how to formulate this as a Markov decision process the transitions can be deterministic or stochastic You must specify all parts of the MDP for full credit For simplicity you can assume that you always place X s and that your opponent always places O s Page 9 of 15 CS 6375 Final Exam Initials Tic Tac Toe continued b 5 points Explain how to pick the discount factor so that an optimal policy for your MDP results in a strategy that cannot lose i e at worst a game can end in a draw Page 10 of 15 CS 6375 Final Exam Initials 5 Uniform Estimation Consider the uniform distribution i e a constant probability density function over the interval 0 for some R 0 Given M data points x 1 x M R consider the following questions a 1 point What is the probability density function for the uniform distribution as a func tion of b 2 points What is the log likelihood Is it a concave function of Page 11 of 15 CS 6375 Final Exam Initials Uniform Estimation continued c 3 points What is the maximum likelihood estimator for d 5 points Using that fact that for any continuous random variable X with di erentiable cumulative distribution function P r X x the probability density function p x d dx P r X x show that the maximum likelihood estimator of is biased How would you make it unbiased Page 12 of 15 CS 6375 Final Exam Initials Uniform Estimation continued e 5 points Let s call a parameterized distribution p for some MLE PAC learnable if there exists f cid 15 polynomial in 1 and independent of such that for every cid 15 0 1 and if M f cid 15 samples are drawn independently from p then M LE cid 15 with probability at least 1 Is the uniform distribution described above MLE PAC learnable If so provide the function f cid 15 and 1 Page 13 of 15 CS 6375 Final Exam Initials 6 Short Answer a 5 points Consider a binary classi cation problem for points in R with a hypothesis space consisting of intervals a b for a b R such that all points inside the interval are labeled plus and all points outside the interval are labeled minus Even if the data can be …
View Full Document