**Unformatted text preview:**

CS 6375 Name (Print):Fall 2018Final Exam12/12/2018This exam contains 15 pages (including this cover page) and 6 problems. Check to see if any pagesare missing. Enter all requested information on the top of this page, and put your initials on thetop of every page, in case the pages become separated.You may NOT use books, notes, or any electronic devices on this exam. Examinees found to beusing any materials other than a pen or pencil will receive a zero on the exam and face possibledisciplinary action.The following rules apply:• Organize your work, in a reasonably neat andcoherent way, in the space provided. Work scat-tered all over the page without a clear orderingwill receive very little credit.• To ensure maximum credit on short answer /algorithmic questions, be sure to EXPLAIN yoursolution.• Problems/subproblems are not ordered by dif-ficulty.• Do not write in the table to the right.Problem Points Score1 242 63 194 155 166 20Total: 100CS 6375 Final Exam Initials:1. True or False and Explain: For each of the following statements indicate whether or notthey are true or false and explain your reasoning. Simply writing true or false without correctreasoning will receive no credit.(a) (4 points) Suppose that you are fitting a Gaussian distribution to data. MAP estimationand maximum likelihood estimation result in the same parameters when the number ofindependently sampled data observations goes to infinity, independent of the choice ofprior.(b) (4 points) Consider two random variables X, Y ∈ {0, 1} representing independent coinflips of possibly biased coins. The only nonzero entries of the empirical covariance matrixof X and Y are the diagonal entries, which must be strictly larger than zero.(c) (4 points) If a given hypothesis space has VC dimension d, then there exists a set of dpoints such that any active learner requires d label queries, in the worst case, in order toproduce a perfect classifier for these d data points.Page 2 of 15CS 6375 Final Exam Initials:(True and False continued)(d) (4 points) A function f : R → R is quasiconvex if f(λx + (1 − λ)y) ≤ max{f (x), f(y)} forall x, y ∈ R and all λ ∈ (0, 1). Every convex function g : R → R is also quasiconvex.(e) (4 points) If actions are only selected greedily in Q-learning, the Q-learning strategy can-not converge to the optimal value function.Page 3 of 15CS 6375 Final Exam Initials:(True and False continued)(f) (4 points) Consider a feedforward neural network structure with 10 binary inputs, onebinary output, and a single layer of 20 hidden units. Further, suppose that the activationfunction on the hidden units are relu’s and the activation on the output is a perceptron.Let the hypothesis space H consist of every neural network that can be obtained from thisstructure for each different choice of the weights and biases. The VC dimension of H is atmost 21.Page 4 of 15CS 6375 Final Exam Initials:2. Gaussian Mixtures: Consider fitting a K component Gaussian mixture to data pointsx(1), . . . , x(M)∈ Rn.(a) (3 points) Is there a unique setting of the mixture weights that maximizes the likelihood?Explain.(b) (3 points) A random initialization for the EM algorithm for Gaussian mixtures mightresult in a poor local optimum. How might you initialize the EM algorithm in practice tohelp avoid this?Page 5 of 15CS 6375 Final Exam Initials:3. Laplace Maximum Likelihood:The Laplace distribution is a probability distribution over the real numbers defined by twoparameters µ, b ∈ R such that b > 0 with probability density p(x|µ, b) =12bexp−|x−µ|b.(a) (6 points) Given data samples x(1), . . . , x(M)∈ R, compute the maximum likelihood esti-mator for µ and b.Hint: useddx|x| =−1 x < 00 x = 01 x > 0.Page 6 of 15CS 6375 Final Exam Initials:(Laplace Maximum Likelihood continued)(b) (8 points) Consider adding a prior probability distribution p(µ) = λ exp(−λµ) for someinteger λ.i. What is the MAP estimate of µ under this prior for λ = 1.ii. How does the choice of λ affect the MAP estimate of µ? How should you pick λ inpractice?Page 7 of 15CS 6375 Final Exam Initials:(Laplace Maximum Likelihood continued)(c) (5 points) Is the maximum likelihood estimator for µ unbiased? Hint: the Laplace distri-bution is symmetric about µ.Page 8 of 15CS 6375 Final Exam Initials:4. Tic-Tac-Toe: Suppose that you wanted to design a strategy for playing tic-tac-toe versus anadversary using reinforcement learning.(a) (10 points) Explain how to formulate this as a Markov decision process (the transitionscan be deterministic or stochastic). You must specify all parts of the MDP for full credit.For simplicity, you can assume that you always place X’s and that your opponent alwaysplaces O’s.Page 9 of 15CS 6375 Final Exam Initials:(Tic-Tac-Toe continued)(b) (5 points) Explain how to pick the discount factor γ so that an optimal policy for yourMDP results in a strategy that cannot lose, i.e., at worst a game can end in a draw.Page 10 of 15CS 6375 Final Exam Initials:5. Uniform Estimation: Consider the uniform distribution, i.e., a constant probability densityfunction, over the interval [0, θ] for some θ ∈ R≥0. Given M data points x(1), . . . , x(M)∈ R,consider the following questions.(a) (1 point) What is the probability density function for the uniform distribution as a func-tion of θ?(b) (2 points) What is the log-likelihood? Is it a concave function of θ?Page 11 of 15CS 6375 Final Exam Initials:(Uniform Estimation continued)(c) (3 points) What is the maximum likelihood estimator for θ?(d) (5 points) Using that fact that for any continuous random variable X with differentiablecumulative distribution function P r(X ≤ x), the probability density function p(x) =ddxP r(X ≤ x), show that the maximum likelihood estimator of θ is biased. How wouldyou make it unbiased?Page 12 of 15CS 6375 Final Exam Initials:(Uniform Estimation continued)(e) (5 points) Let’s call a parameterized distribution, p(·|θ) for some θ ∈ Θ, MLE PAC-learnable if there exists f(, δ) polynomial in1and1δand independent of θ such that forevery , δ ∈ (0, 1) and θ ∈ Θ, if M ≥ f(, δ) samples are drawn independently from p(·|θ),then |θ − θMLE| ≤ with probability at least 1 − δ. Is the uniform distribution describedabove MLE PAC learnable? If so, provide the function f .Page 13 of 15CS 6375 Final Exam Initials:6. Short Answer:(a) (5 points) Consider a binary classification

View Full Document