**Unformatted text preview:**

Problem Set 2 CS 6375 Due 3 6 2022 by 11 59pm Note all answers should be accompanied by explanations and relevant code for full credit All code Python or MATLAB only should be turned in with your answers to the following questions Late homeworks will not be accepted Problem 1 Parkinson s Disease 40 pts For this problem you will use the cancer data set provided with this problem set The data has been divided into three pieces park train data park validation data and park test data These data sets were generated using the UCI Parkinsons Data Set data set follow the link for information about the format of the data Note that class label health status of the subject is the rst column in the data set All code Python or MATLAB only should be turned in with your answers to the following questions 1 Primal SVMs a Using gradient descent or quadratic programming apply the SVM with slack formulation to train a classi er for each choice of c 10 4 10 3 103 104 without using any feature maps b What is the accuracy of the learned classi er on the training set for each value of c c Use the validation set to select the best value of c What is the accuracy on the validation set for each value of c d Report the accuracy on the test set for the selected classi er 2 Dual SVMs with Gaussian Kernels a Using quadratic programming apply the dual of the SVM with slack formulation to train a classi er for each choice of c 10 4 10 3 103 104 using a Gaussian kernel with 2 10 3 103 b What is the accuracy of the learned classi er on the training set for each pair of c and 2 c Use the validation set to select the best value of c and 2 What is the accuracy on the validation set for each pair of c and 2 d Report the accuracy on the test set for the selected classi er 3 Which of these approaches if any should be preferred for this classi cation task Explain 1 Problem 2 Method of Lagrange Multipliers 15 pts Suppose that we modi ed the objective function in the SVM with slack formulation to be a quadratic penalty instead of a linear penalty that is minimize 1 i subject to the same constraints as the standard SVM with slack What is the dual of this new quadratic penalized SVM with slack problem for a xed c Can the kernel trick still be applied 2 w 2 c cid 80 i 2 Problem 3 Poisonous Mushrooms 25 pts For this problem you will use the mushroom data set provided with this problem set The data has been divided into two pieces mush train data and mush test data These data sets were generated using the UCI Mushroom data set follow the link for information about the format of the data Note that the class label is the rst column in the data set 1 Assuming you break ties using the attribute that occurs last left to right in the data draw the resulting decision tree and report the maximum information gain for each node that you added to the tree 2 What is the accuracy of this decision tree on the test data 3 Now consider arbitrary input data Suppose that you decide to limit yourself to decision trees of height one i e only one split Is the tree produced by the information gain heuristic optimal on the training data that is no other decision tree has higher accuracy Problem 4 Cross Validation 20 pts Using a single tuning set for the hyperparameters can yield an unreliable predictor of the class label i e maybe it was not a representative sample of the data plus some data is wasted using this approach An alternative approach that is particularly applicable for small data sets is k fold cross validation 1 Partition the non test data into k equally sized buckets 2 For each possible set of hyperparameters you will train the model using exactly k 1 of the partitions while the held out partition is used as a validation data set 3 As there are k di erent ways to hold out one partition all k possibilities are tried and the average validation set accuracy as measured by the appropriate held out data of the k di erent models learned for each of the hyperparameter settings is used to select the winning hyperparameters 4 Finally the model is retrained using all of the non test data with the winning hyperparameters and then evaluated using the test data Apply 10 fold cross validation to t an SVM with slack classi er no feature maps to the data set wdbc train data each row corresponds to a single data observation and the class label 1 1 is the rst entry in each row Use the same hyperparameter ranges as Problem 1 1 and the partitions for cross validation should be selected as equally sized contiguous blocks of data starting from the rst data element Report the best setting of the hyperparameters and the accuracy on the test set wdbc test data 2

View Full Document