**Unformatted text preview:**

Problem Set 3CS 6375Due: 4/6/2022 by 11:59pmNote: all answers should be accompanied by explanations for full credit. Late homeworks will notbe accepted.Problem 1: VC Dimension (25 pts)1. Consider a binary classification problem for data points in R3with a hypothesis space con-sisting of axis aligned 3-d boxes such that any point in the box is labeled with a + and anypoint outside the box is labeled with a −. What is the VC dimension of this hypothesisspace? Prove it. Can you generalize your argument to axis aligned boxes in Rd?Problem 2: Spherical Hypotheses (25 pts)Given training data of the form (x(1), y(1)), . . . , (x(M ), y(M )), where x(m)∈ Rnand y(m)∈ {−1, 1},consider the hypothesis space of n-dimensional spheres: each element of the hypothesis space isparameterized by a center c ∈ Rnand a radius r > 0 such that all points within distance r of thecenter c are classified as +1 and the remaining points are classified with a −1.1. Assuming that the training data can be correctly classified under the sphereical hypothesisspace, describe an optimization problem whose solution is a spherical hypothesis that is amax-margin perfect classifier.2. Using the method of Lagrange multipliers, construct the dual of your optimization problem.Problem 3: Medical Diagnostics (50 pts)For this problem, you will use the data set provided with this problem set. The data has beendivided into two pieces heart train.data and heart test.data. These data sets were generated usingthe UCI SPECT heart data set (follow the link for information about the format of the data). Notethat the class label is the first column in the data set.1. Suppose that the hypothesis space consists of all decision trees with exactly three attributesplits (repetition along the same path is allowed) for this data set.(a) Run the adaBoost algorithm for five rounds to train a classifier for this data set. Drawthe 5 selected trees in the order that they occur and report the ϵ and α, generated byadaBoost, for each.1(b) Run the adaBoost algorithm for 10 rounds of boosting. Plot the accuracy on the trainingand test sets versus iteration number.2. Now, suppose that the hypothesis space consists of only height 1 decision trees for this dataset.(a) Use coordinate descent to minimize the exponential loss function for this hypothesisspace over the training set. You can use any initialization and iteration order that youwould like other than the one selected by adaBoost. What is the optimal value of α thatyou arrived at? What is the corresponding value of the exponential loss on the trainingset?(b) What is the accuracy of the resulting classifier on the test data?(c) What is the accuracy of adaBoost after 20 rounds for this hypothesis space on the testdata? How does the α learned by adaBoost compare to the one learned by gradientdescent?(d) Use bagging, with 20 bootstrap samples, to produce an average classifier for this dataset. How does it compare to the previous classifiers in terms of accuracy on the test set?(e) Which of these 3 methods should be preferred for this data set and

View Full Document