CS 188Fall 2019 Section Handout 11 Solutions1 Naive BayesIn this question, we will train a Naive Bayes classifier to predict class labels Y as a function of input featuresA and B. Y , A, and B are all binary variables, with domains 0 and 1. We are given 10 training points fromwhich we will estimate our distribution.A 1 1 1 1 0 1 0 1 1 1B 1 0 0 1 1 1 1 0 1 1Y 1 1 0 0 0 1 1 0 0 0(a) What are the maximum likelihood estimates for the tables P (Y ), P (A|Y ), and P (B|Y )?Y P (Y )0 3/51 2/5A Y P (A|Y )0 0 1/61 0 5/60 1 1/41 1 3/4B Y P (B|Y )0 0 1/31 0 2/30 1 1/41 1 3/4(b) Consider a new data point (A = 1, B = 1). What label would this classifier assign to this sample?P (Y = 0, A = 1, B = 1) = P (Y = 0)P (A = 1|Y = 0)P (B = 1|Y = 0) (1)= (3/5)(5/6)(2/3) (2)= 1/3 (3)P (Y = 1, A = 1, B = 1) = P (Y = 1)P (A = 1|Y = 1)P (B = 1|Y = 1) (4)= (2/5)(3/4)(3/4) (5)= 9/40 (6)(7)Our classifier will predict label 0.(c) Let’s use Laplace Smoothing to smooth out our distribution. Compute the new distribution for P (A|Y )given Laplace Smoothing with k = 2.1A Y P (A|Y )0 0 3/101 0 7/100 1 3/81 1 5/82 PerceptronYou want to predict if movies will be profitable based on their screenplays. You hire two critics A and B to reada script you have and rate it on a scale of 1 to 4. The critics are not perfect; here are five data points includingthe critics’ scores and the performance of the movie:# Movie Name A B Profit?1 Pellet Power 1 1 -2 Ghosts! 3 2 +3 Pac is Bac 2 4 +4 Not a Pizza 3 4 +5 Endless Maze 2 3 -AB(a) First, you would like to examine the linear separability of the data. Plot the data on the 2D plane above;label profitable movies with + and non-profitable movies with − and determine if the data are linearlyseparable. The data are linearly separable.(b) Now you decide to use a perceptron to classify your data. Suppose you directly use the scores given aboveas features, together with a bias feature. That is f0= 1, f1= score given by A and f2= score given byB.Run one pass through the data with the perceptron algorithm, filling out the table below. Go throughthe data points in order, e.g. using data point #1 at step 1.step Weights Score Correct?1 [-1, 0, 0] −1 · 1 + 0 · 1 + 0 · 1 = −1 yes2 [-1, 0, 0] −1 · 1 + 0 · 3 + 0 · 2 = −1 no3 [0, 3, 2] 0 · 1 + 3 · 2 + 2 · 4 = 14 yes4 [0, 3, 2] 0 · 1 + 3 · 3 + 2 · 4 = 17 yes5 [0, 3, 2] 0 · 1 + 3 · 2 + 2 · 3 = 12 noFinal weights: [-1, 1, -1](c) Have weights been learned that separate the data? With the current weights, points will be classified aspositive if −1 · 1 + 1 · A + −1 · B ≥ 0, or A − B ≥ 1. So we will have incorrect predictions for data points 3:−1 · 1 + 1 · 2 + −1 · 4 = −3 < 0and 4:−1 · 1 + 1 · 3 + −1 · 4 = −2 < 0Note that although point 2 has w · f = 0, it will be classified as positive (since we classify as positive ifw · f ≥ 0).(d) More generally, irrespective of the training data, you want to know if your features are powerful enoughto allow you to handle a range of scenarios. Circle the scenarios for which a perceptron using the featuresabove can indeed perfectly classify movies which are profitable according to the given rules:2(a) Your reviewers are awesome: if the total of their scores is more than 8, then the movie will definitelybe profitable, and otherwise it won’t be. Can classify (consider weights [−8, 1, 1])(b) Your reviewers are art critics. Your movie will be profitable if and only if each reviewer gives eithera score of 2 or a score of 3. Cannot classify(c) Your reviewers have weird but different tastes. Your movie will be profitable if and only if bothreviewers agree. Cannot
View Full Document