Unformatted text preview:

Midterm: CS 6375Spring 2015SolutionsThe exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spacesprovided on the question sheets. If you run out of room for an answer, use an additional sheet (availablefrom the instructor) and staple it to your exam.• NAME• UTD-ID if knownQuestion Points ScoreDecision Trees 10Linear Classifiers 10Neural Networks 20Support Vector Machines 10Point Estimation 10Naive Bayes 10Instance Based Learning 10Extra Credit: Linear Regression 20Total: 1001Spring 2015 Final, Page 2 of 16 March 26, 2015Spring 2015 Final, Page 3 of 16 March 26, 2015Question 1: Decision Trees (10 points)Consider the training dataset given below. A, B, and C are the attributes and Y is the class variable.A B C Y0 1 0 Yes1 0 1 Yes0 0 0 No1 0 1 No0 1 1 No1 1 0 Yes(a) (2 points) Can you draw a decision tree having 100% accuracy on this training set? If you answeris yes, draw the decision tree in the space provided below. If your answer is no, explain why?Solution: No. Because Examples #2 and #4 have the same feature vector but different classes.(b) (3 points) Which attribute among A, B and C has the highest information gain? Explain youranswer.Solution:IG(A) = H(3, 3) −12H(2, 1) −12H(2, 1)IG(B) = H(3, 3) −12H(2, 1) −12H(2, 1)IG(C) = H(3, 3) −12H(2, 1) −12H(2, 1)Therefore, all three attributes have the same information gain.Spring 2015 Final, Page 4 of 16 March 26, 2015(c) (3 points) You are given a collection of datasets that are linearly separable. Is it always possibleto construct a decision tree having 100% accuracy on such datasets? True or False. Explain youranswer.Solution: True. A decision tree can represent any Boolean function. Since a linear classifiercan be represented using a Boolean function and any linear classifier will have 100% accuracyon the training dataset, the decision tree representing the Boolean function associated with thelinear classifier will also have 100% accuracy on the training dataset.(d) (2 points) You are given two decision trees that represent the same target concept. In other words,given the same input the two decision trees will always yield the same output. Does it imply thatthe two decision trees have the same number of nodes? True or False. Explain your answer.Solution: False. Consider two decision trees representing the target concept (x1∨x2)∧(x2∨x3). Draw one along the ordering x2, x1and x3and draw the second one along the orderingx1, x2, and x3. The first decision tree has 7 nodes, while the second one has 9 nodes. (Pleaseverify).Spring 2015 Final, Page 5 of 16 March 26, 2015Question 2: Linear Classifiers (10 points)Recall that a linear threshold function or a linear classifier is given by: If (w0+Piwixi) > 0 thenclass is positive, otherwise it is negative. Assume that 1 is true and 0 is false.(a) (5 points) Consider a function over n Binary features, defined as follows. If at least k variablesare false, then the class is positive, otherwise the class is negative. Can you represent this functionusing a linear threshold function. If your answer is YES, then give a precise numerical setting ofthe weights. Otherwise, clearly explain, why this function cannot be represented using a linearthreshold function.Solution: YES. Consider the following setting of weights: wi= −1 for all i and w0=n − k + 0.5. If at least k variables are false, thenPiwixi≥ −n + k. Therefore:w0+Xiwixi≥ −n + k + n − k + 0.5 ≥ 0.5 > 0On the other hand, if fewer than k variables are false, thenPiwixi≤ −(n − (k − 1)) =−n + k − 1. Therefore:w0+Xiwixi≤ −n + k − 1 + n − k + 0.5 ≤ −0.5 < 0(b) (5 points) Consider a linear threshold function over n real-valued features having w0= 0 (namelythe bias term is zero). Can you always represent such a function using a linear threshold functionhaving only n − 1 features? Answer YES or NO and briefly explain your answer. Note that nocredit will be given if your explanation is incorrect.Solution: NO. Because we have to find a weight vector (v0, v1, . . . , vn−1) such thatnXi=1wixi= v0+n−1Xi=1vixiThis is not possible because the left hand side has n variables and the right hand side has onlyn − 1 variables.Spring 2015 Final, Page 6 of 16 March 26, 2015Question 3: Neural Networks (20 points)(a) (10 points) Draw a neural network that represents the function f (x1, x2, x3) defined below. Youcan only use two types of units: linear units and sign units. Recall that the linear unit takes asinput weights and attribute values and outputs w0+Piwixi, while the sign unit outputs +1 if(w0+Piwixi) > 0 and −1 otherwise.x1x2x3f(x1, x2, x3)0 0 0 100 0 1 -50 1 0 -50 1 1 101 0 0 -51 0 1 101 1 0 101 1 1 10Note that to get full credit, you have to write down the precise numeric weights (e.g., −1, −0.5,+1, etc.) as well as the precise units used at each hidden and output node.Solution: Several solutions are possible. The simplest solution is to have two hidden layersand one output node. The first hidden layer has three nodes with each unit being a “sign unit”,the second layer has one node, which is also a sign unit and the output node is a linear unit.The three hidden nodes in the first hidden layer represent the following functions respectively:1. If ¬x1∧ ¬x2∧ x3evaluates to true, output a +1 otherwise output a −1. An examplesetting of weights is w1= −1, w2= −1 and w3= 1 and w0= −0.5.2. If ¬x1∧ x2∧ ¬x3evaluates to true, output a +1 otherwise output a −1. An examplesetting of weights is w1= −1, w2= 1 and w3= −1 and w0= −0.5.3. If x1∧ ¬x2∧ ¬x3evaluates to true, output a +1 otherwise output a −1. An examplesetting of weights is w1= 1, w2= −1 and w3= −1 and w0= −0.5.The input to the hidden unit in the second hidden layer is the output of all hidden nodes in thefirst hidden layer. It represents the following function:1. If at least one of the inputs is a +1, the unit outputs a +1, otherwise it outputs a −1. Itthus represents an OR node. An example setting of weights is w0= −0.5 and w1=w2= w3= 1.The input to the output node is the output of the node in the second hidden layer. It representsthe following function1. If the input is +1 then output −5, otherwise output 10. An example setting of weightsis w0= 5/2 and w1= −15/2.Spring 2015 Final, Page 7 of 16 March 26, 2015(b) (10 points) Derive a gradient descent training algorithm for the following “special” unit. The“special” unit takes as input a


View Full Document

UTD CS 6375 - Midterm: CS 6375

Download Midterm: CS 6375
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Midterm: CS 6375 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Midterm: CS 6375 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?