CS 6375 Machine Learning Midterm Practice 1. Short questions. (i) True or false question. You don’t need to explain your answer. a. In an HMM, the current observation is independent of previous observations given the current state. b. In a binary classification task where the class label C is 1 or 0, and the features have discrete values, for naïve Bayes classifier, once P(Xi=k|C=1) is estimated, P(Xi=k|C=0) can be calculated as 1- P(Xi=k|C=1). c. Single layer linear perceptron cannot be used to implement XOR functions since it is not linearly separable. d. The naïve Bayes assumption is that the observations are independent of the class variable. e. For a binary classification task where there are M boolean valued attributes, when building a decision tree, there are maximum 2M possible leaves. (ii) For a classification task, assume that there are no training instances that have the same feature values but different class labels, what is the training error rate for the following three classifiers? Please circle the one that is correct. You don’t need to explain why. • decision tree: 0 impossible to tell • 1 nearest neighbor: 0 impossible to tell • single layer perceptron: 0 impossible to tell (iii) For a classification task, when increasing the training instances linearly, how is the testing complexity change for the following two classifiers? Check the right answer. • 1 nearest neighbor: (a) no effect or (b) complexity increases linearly • Naïve bayes classifier (a) no effect or (b) complexity increases linearly(iv) The following table shows the training data for a binary classification task, where the first column is the index for the data points, the second one shows the value for the feature (just one attribute) and the last column gives the class label. We will use nearest neighbor classifier for this problem. sample Attribute Class label 1 0.1 - 2 0.7 - 3 1.0 + 4 1.6 + 5 2.0 + 6 2.3 - 7 3.2 + 8 3.5 - 9 4.0 + 10 4.9 + What is the 5-fold CV error of 1-NN on this data set? Please split the data like this: I: instance 1, 2; II: 3, 4; III: 5, 6; IV: 7, 8; V: 9, 10. You don’t need to show your computation in details. (v) In a binary classification task, there are 100 positive and 100 negative samples in the test set. Your classifier labeled 90 samples as positive, and among them 70 are correct. What is your classifier’s classification accuracy? What are the precision and recall values? In the ROC space, where would you put your classifier? 2. Bayes rule and MLE. (i) Suppose that we have two bags each containing black and white balls. One bag contains three times as many white balls as blacks. The other bag contains three times as many black balls as white. Suppose we choose one of these bags at random. For this bag we select 5 balls at random, replacing each ball after it has been selected. The result is that we find 4 white balls and one black. What is the probability that we were using the bag with more white balls? (ii) Suppose there is a bag with white and black balls. I randomly draw a ball from the bag, record its color, and put it back. If I draw n balls, and there are k white ones. What’s the MLE estimate of the percentage of the white balls in the bag? Show your work.3. Decision tree and bayes classifier. [1]. Whether your friend Mike plays basketball is based on two factors: the weather and if he has an exam in the coming week. You want to predict if Mike will play based on these two factors. The following table shows the training data from his previous experience. Weather Exam Play Bad Yes No Good Yes No Good Yes No Good Yes No Bad No No Bad No Yes Bad No Yes Good No Yes (A) First you decide to build a decision tree classifier with information gain criterion, what is the first attribute you should use? Show your work. (B) If today’s weather is good and Mike has an exam coming, how would you predict whether he will play if you use a naïve Bayes classifier? Show your computation. [2] You notice that Mike’s roommate goes to run when the weather is good and stays at home when it is bad. So you are thinking maybe you can add this as an extra feature (i.e., three features now: weather, exam, and roommate’s behavior). What do you think of this method? Will the new factor improve the performance of the naïve Bayes classifier? Please explain your answer briefly. 4. HMM. Suppose we have a hidden Markov model with three possible states, and two observation symbols. The states at times t are represented by variables xt, and the observations are represented by yt. The transition probabilities and observation probabilities are shown below. The starting state is S1. S1 S2 S3 S1 0.5 0.5 0 S2 0 0.2 0.8 S3 0 0 1S1 S2 S3 y=0 0.5 0.5 0.1 y=1 0.5 0.5 0.9 A) List all the possible hidden state sequences over the three time points t=1,2,3. Time t starts from 1. B) What is the most likely hidden state sequence given the observed sequence Y: y1=0, y2=0, y3=0? Is the answer unique? C) What is the posterior probability that P(x3=S2|Y)? where Y is the same observation as the question above, i.e., y1=0, y2=0, y3=0. 5. Neural nets. A) Find the weight update rule for wi for the following regression function. Assume the update is done after reading one data sample with attributes xi (i from 0 to D) and target value t. ∑=++=Diiiiixxxwy032)3( B) Consider a single layer perceptron that has two real-valued inputs and an output unit with a step function as its activation function. All the initial weights equal 0.1. What are the new weights after processing the following training example: <(3, -2), 0> (where 0 is the label)? Let η (the learning rate) be 0.2 and be sure to also adjust the output unit’s bias during training. Use a value of 1 for the bias
View Full Document