Problem Set 2CS 6375Due: 3/6/2022 by 11:59pmNote: all answers should be accompanied by explanations and relevant code for full credit. Allcode (Python or MATLAB only) should be turned in with your answers to the following questions.Late homeworks will not be accepted.Problem 1: Parkinson’s Disease (40 pts)For this problem, you will use the cancer data set provided with this problem set. The data has beendivided into three pieces park train.data, park validation.data, and park test.data. These data setswere generated using the UCI Parkinsons Data Set data set (follow the link for information aboutthe format of the data). Note that class label, health status of the subject, is the first column inthe data set. All code (Python or MATLAB only) should be turned in with your answers to thefollowing questions.1. Primal SVMs(a) Using gradient descent or quadratic programming, apply the SVM with slack formulationto train a classifier for each choice ofc ∈ {10−4, 10−3, · · · , 103, 104} without using any feature maps.(b) What is the accuracy of the learned classifier on the training set for each value of c?(c) Use the validation set to select the best value of c. What is the accuracy on the validationset for each value of c?(d) Report the accuracy on the test set for the selected classifier.2. Dual SVMs with Gaussian Kernels(a) Using quadratic programming, apply the dual of the SVM with slack formulation totrain a classifier for each choice ofc ∈ {10−4, 10−3, · · · , 103, 104} using a Gaussian kernel withσ2∈ {10−3, · · · , 103}.(b) What is the accuracy of the learned classifier on the training set for each pair of c andσ2?(c) Use the validation set to select the best value of c and σ2. What is the accuracy on thevalidation set for each pair of c and σ2?(d) Report the accuracy on the test set for the selected classifier.3. Which of these approaches (if any) should be preferred for this classification task? Explain.1Problem 2: Method of Lagrange Multipliers (15 pts)Suppose that we modified the objective function in the SVM with slack formulation to be a quadraticpenalty instead of a linear penalty, that is minimize12||w||2+cPiξ2isubject to the same constraintsas the standard SVM with slack. What is the dual of this new quadratic penalized SVM with slackproblem for a fixed c? Can the kernel trick still be applied?Problem 3: Poisonous Mushrooms? (25 pts)For this problem, you will use the mushroom data set provided with this problem set. The data hasbeen divided into two pieces mush train.data and mush test.data. These data sets were generatedusing the UCI Mushroom data set (follow the link for information about the format of the data).Note that the class label is the first column in the data set.1. Assuming you break ties using the attribute that occurs last (left to right) in the data, drawthe resulting decision tree and report the maximum information gain for each node that youadded to the tree.2. What is the accuracy of this decision tree on the test data?3. Now consider arbitrary input data. Suppose that you decide to limit yourself to decisiontrees of height one, i.e., only one split. Is the tree produced by the information gain heuristicoptimal on the training data (that is, no other decision tree has higher accuracy)?Problem 4: Cross-Validation (20 pts)Using a single tuning set for the hyperparameters can yield an unreliable predictor of the classlabel, i.e., maybe it was not a representative sample of the data, plus some data is “wasted” usingthis approach. An alternative approach that is particularly applicable for small data sets is k-foldcross-validation.1. Partition the non-test data into k equally sized buckets.2. For each possible set of hyperparameters you will train the model using exactly k − 1 of thepartitions while the held out partition is used as a validation data set.3. As there are k different ways to hold out one partition, all k possibilities are tried and theaverage validation set accuracy (as measured by the appropriate held-out data) of the kdifferent models learned for each of the hyperparameter settings is used to select the winninghyperparameters.4. Finally, the model is retrained using all of the non-test data with the winning hyperparametersand then evaluated using the test data.Apply 10-fold cross validation to fit an SVM with slack classifier (no feature maps) to the dataset wdbc train.data (each row corresponds to a single data observation and the class label +1/-1 isthe first entry in each row). Use the same hyperparameter ranges as Problem 1.1 and the partitionsfor cross validation should be selected as equally sized contiguous blocks of data starting from thefirst data element. Report the best setting of the hyperparameters and the accuracy on the testset wdbc
View Full Document