CS 6375 Machine Learning, 2015 Spring Homework 4 Total points: 100 points Part I. Written problems [65 points, Due 03/13/2015] 1. Nearest neighbor classifier. [10 pts] For the data set shown below in the figure, draw a decision boundary for k=1 and k=3 (a rough sketch is okay). What can you say about overfitting? 1-NN 3-NN 2. Nearest neighbor classifier. [10 pts] Consider the k-nearest neighbor rule in a 2-class problem with equal prior probabilities. There is only one attribute (with continuous values). Assume that n data points (your training set) are generated independently in the following way: first a class is randomly picked (again, they have equal prior probabilities), then a sample is randomly generated for this class based on the class-conditional probability. Assume further that the class-conditional probability densities P(X|C1) and P(X|C2) are uniform [0, 1] and [10, 11] respectively. The distance used for kNN is just the absolute difference between two values. Show that for an odd value of k, the average probability of error is given by: ∑−==2/)1(021)(kjnnjnep3. Cross validation (CV). [10 pts] The following table shows the training data for a binary classification task, where the first column is the index for the data points, the second one shows the value for the feature (just one attribute) and the last column gives the class label. We will use nearest neighbor classifier for this problem. sample Attribute 1 Class label 1 -0.1 - 2 0.7 - 3 1.0 + 4 1.6 + 5 2.0 + 6 2.5 - 7 3.2 + 8 3.5 - 9 4.1 + 10 4.9 + (a) What is the 5-fold CV error of 1-NN on this data set? Split the data like this: I: instance 1, 2; II: 3, 4; III: 5, 6; IV: 7, 8; V: 9, 10. (b) What is the 2-fold CV error of 3-NN on this data set? Use the first 5 data points in the first subset, and the last 5 points in the other subset. 4. HMM application & paper reading. [15 pts] Read a paper that uses HMM for some applications (e.g., in bioinformatics, speech recognition, language processing). Summarize the paper. Please describe the task clearly. Try to make it understandable by someone who is not working in that particular field but has some general machine learning background. Please also explain what the hidden states and the transition probabilities are for the particular task, and how those parameters are estimated. 5. HMM. [20 pts]. Considering a 3-state (S1, S2, S3), 2-output (V1, V2) HMM, specified by the following model parameters: Initial state vector: π = [ 1 0 0 ]State transition matrix =0.10.00.04.06.00.01.04.05.0A Observation probability =8.02.04.06.03.07.021VVB (a) Draw the state transition diagram for the above model, labeling each arc with the probability of the transition. (b) Show the trellis of the allowable paths for the model for an observation sequence of length 4. Assume the final state must be S3. Note: for your graph, use X-axis for time steps, and Y-axis for states. Please use 1 as the starting time. (c) For the observation sequence O=V1V1V1V2, what is the probability P(O|model) considering all possible paths that end in S3 at time T=4? Use forward algorithm for this question. (d) For the same observation sequence, what is P(O|model) for the most likely path? What is the most likely path? Use Viterbi algorithm for this problem. Again, assume the final state must be
View Full Document