CS 6375 Machine Learning, Spring 2015 Homework 2. Total points: 100 Due: 02/11/2015 2:30pm 1. Bayes rules. [20 pts] Suppose you are given a bag containing n unbiased coins. You are told that n-1 of these coins are normal, with heads on one side and tails on the other, whereas one coin is a fake, with heads on both sides. A. Suppose you reach into the bag, pick out a coin uniformly at random, flip it, and get a head. What is the (conditional) probability that the coin you chose is the fake coin? B. Suppose you continue flipping the coin for a total of k times after picking it and see k heads. Now that is the conditional probability that you picked the fake coin? 2. Bayes classifier and Naïve Bayes classifier. [35 pts] (A). The following data set is used to learn whether a person likes a movie or not. major studio? Genre win award? Like the movie no Sci-fi yes yes yes Action no yes no Music yes no yes Action yes yes no Sci-fi no no no Action no no yes Sci-fi no no yes Music yes yes no Music no no no Action yes no Assume you train a naïve Bayes classifier from this data set. How would it classify the following two instances? (i) major_studio=yes ^ genre=action ^ win_award=yes (ii) major_studio=yes ^ genre=action ^ win_award=no (B). Suppose now you train a Bayes classifier on this data set. How would it classify the two instances above? Please show your work. You only need to show the steps or calculations that are relevant for the classification of the given instances. You don’t need to estimate all the parameters in the model.(C). There are M attributes in a data set, all binary features. You use a naïve Bayes classifier to learn the target concept (binary classification). Exactly how many distinct probability terms must be estimated from the training data to learn a naïve Bayes classifier for this problem? Naïve Bayes classifier makes conditional independence assumptions to reduce the complexity of estimating P(target|attr_1,attr_2, …,attr_M) from the training data. If no such assumptions are made, that is, using a Bayes classifier, how many distinct probability terms must be estimated from the training data? 3. Maximum Likelihood Estimation [45 pts]: (A) Suppose X is a binary random variable that takes value 0 with probability p and value 1 with probability 1-p. Let X1, …, Xn be IID samples of X. (i) Compute an MLE estimate of p (denote it by pˆ). (ii) What’s the expectation of this estimate? If it is equal to p, it is called unbiased estimate; otherwise it’s biased. Is the MLE estimate unbiased? (B) The Poisson pmf is defined as }2,1,0{!)|(L∈=−xxexPoixλλλwhere λ > 0 is the rate parameter. Let x1, …, xn be IID samples of x. Drive the MLE for λ. (C) Multinomial distribution and MLE. Assume when you toss a dice (it may not be a fair one), the probability of resulting in one of its 6 sides is p1, p2, … p6 (note: their sum is 1). If you toss the dice 1000 times, and the number of times observing each side is: 150, 200, 180, 150, 200, 120 (for side 1, 2, … 6 respectively). What is the MLE estimate for pi (i=1,
View Full Document