UTD CS 6375 - Problem Set 1 - D3522920

Home> Schools> The University of Texas at Dallas> Computer Science (CS) > CS 6375> Problem Set 1

UTD CS 6375 - Problem Set 1

School name The University of Texas at Dallas

Pages 3

Download Save

Unformatted text preview:

Problem Set 1 CS 6375 Due 2 10 2022 by 11 59pm Note all answers should be accompanied by explanations for full credit All code used as part of your solutions should be included for partial credit Late homeworks will not be accepted Warm Up Subgradients More 15 pts 1 Recall that a function f Rn R is convex if for all x y Rn and 0 1 f x 1 f y f x 1 y Using this de nition show that a f x wf1 x is a convex function for x Rn whenever f1 Rn R is a convex b f x f1 x f2 x is a convex function for x Rn whenever f1 Rn R and c f x max f1 x f2 x is a convex function for x Rn whenever f1 Rn R and function and w 0 f2 Rn R are convex functions f2 Rn R are convex functions 2 Compute a subgradient at the speci ed points for each of the following convex functions a f x max x2 2x x at x 0 x 2 and x 1 b g x max x 1 2 x 2 2 at x 1 5 and x 0 Problem 1 Perceptron Learning 45 pts Consider the data set perceptron data attached to this homework This data le consists of M data elements of the form x m 4 R de ne a data point 1 in R4 and y m 1 1 is the corresponding class label y m where x m x m x m 2 x m 3 x m 4 1 1 In class we saw how to use the perceptron algorithm to minimize the following loss function 1 M M cid 88 m 1 max 0 y m wT x m b What is the smallest in terms of number of data points two dimensional data set containing both class labels on which the perceptron algorithm with step size one fails to converge Use this example to explain why the method may fail to converge more generally 1 2 Consider the following alternative loss function 1 M M cid 88 m 1 max 0 1 y m wT x m b 2 a If the data is linearly separable does this loss function have any local optima that are not global optima b For each optimization strategy below report the number of iterations that it takes to nd a perfect classi er for the data the values of w and b for the rst three iterations and the nal weights and bias Each descent procedure should start from the initial point w0 b0 0 0 0 0 0 i Standard subgradient descent with the step size t 1 for each iteration ii Stochastic subgradient descent where exactly one component of the sum is chosen to approximate the gradient at each iteration Instead of picking a random component at each iteration you should iterate through the data set starting with the rst element then the second and so on until the M th element at which point you should start back at the beginning again Again use the step size t 1 iii How does the rate of convergence change as you change the step size Provide some example step sizes to back up your statements c Does your subgradient descent implementation with step size one always converge using this loss Explain Problem 2 Separability Feature Vectors 15 pts 1 Consider the following data set 2 x 2 1 0 1 2 2 2 1 1 2 0 x1 Under which of the following feature vectors is the data linearly separable For full credit you must justify your answer by either providing a linear separator or explaining why such a separator does not exist a x1 x2 c x1 x2 b x1 x2 d x1 x2 cid 20 x1 x2 x1 x2 x2 1 x2 2 x1x2 cid 21 cid 21 cid 20 exp x1 exp x2 cid 20 x1 sin x2 x1 cid 21 2 Suppose that you wanted to perform polynomial regression for 2 dimensional data points using gradient descent i e you want to t a polynomial of degree k to your data Explain how to do this using feature vectors What is the per iteration complexity of gradient descent as a function of the size of your feature representation and the number of training data points Problem 3 Support Vector Machines 25 pts For this problem consider the data set mystery data attached to this homework that like Problem 1 contains four numeric attributes per row and the fth entry is the class variable either or Find a perfect classi er for this data set using support vector machines Your solution should explain the optimization problem that you solved and provide the learned parameters the optimal margin and the support vectors Note for full credit your solution should only make use of a quadratic programming solver i e you should not rely on existing SVM toolkits 3

View Full Document


School:
Email:
New Password:
Confirm Password:

UTD CS 6375 - Problem Set 1

Sign up for free to view:

Please select your school