Quiz 3 CSCI 4202 Worth 14 of your final mark April 27 2006 All of the questions require short answers at most a few sentences All are of equal value so answer the easy ones first THIS IS A OPEN BOOK TAKE HOME QUIZZ You may not discuss your answers with anyone Hand in Your Completed Quiz Under My Office Door by 9 00AM Monday May 1 1 What is the goal of Supervised Learning 2 Assume that data is generated from the following function y f x 0 1 where f x is a real valued function x is a point identically independently d distributed from some stationary distribution D x and is a random variable with mean zero E 0 and finite variance V c c 0 the distribution that generates the random noise variable is constant for all x d If x 1 2 0 98 and c 0 001 how many different values of y can be observed note that the notation x 1 2 0 98 means that x is a three dimensional vector and hence f x is a function of three variables How many different values of f x can be observed when x 1 2 0 98 and c 0 001 Similarly if x 0 29 0 11 2 1 and c 0 how many different values of y can be observed How many different values of f x can be observed when x 0 29 0 11 2 1 and c 0 1 3 Is the data given in the figure below consistent with the assumptions in the previous question briefly explain your answer Class of Regression Problems Addressed 3 y f x 2 5 y f x x 2 1 5 1 0 5 0 0 5 1 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1 x 4 Assume you would like to learn a linear model of the form d y i xi 0 i 1 Assume that the coefficients 0 N are obtained form the learning data x1 y1 x N yN to minimize 2 N d arg min yi 0 j xij i 1 j 1 What is this type of learning algorithm called 2 5 Assume the same linear model as in the previous question However now the coefficients 0 N are obtained form the learning data x1 y1 x N y N to minimize 2 N d d 2 arg min yi 0 j xij j i 1 j 1 j 1 where 0 What is this type of learning algorithm called What happens when you set 0 What happens to the coefficients 1 N as is increased When would you use this learning algorithm over the one in the previous question 6 Assume the same linear model as in the previous two questions However now the coefficients 0 N are obtained form the learning data x1 y1 x N y N to minimize 2 N d arg min yi 0 j xij i 1 j 1 d subject to j s s 0 j 1 What is this type of learning algorithm called When would you use this learning algorithm over the one in the previous question 3 7 Now assume a model with the following structure K y i i x 0 i 1 where i x are nonlinear basis functions of x e g i x x1 x2 is one example Is this a linear model in basis function space 1 x K x Is this a linear model in x d space 8 You would like to learn a linear model of the form d y i xi 0 i 1 Assume that the coefficients 0 N are obtained form the learning data x1 y1 x N yN using ridge regression and that x d d 1000000 and N 10 Would it be more computationally efficient to use Kernel Ridge Regression of standard Linear Ridge Regression to estimate 0 N Why 9 If the model is y 2 x12 3 x29 5 what is y when x1 1 and x2 1 10 Give an example i e write out a description of the algorithm of the K Nearest Neighbor algorithm when K 1 4 11 Assume a loss function L a1 aK that depends on K parameters a1 aK and assume that it is differentiable with respect to those parameters Give an update formula for incrementally modifying the a1 aK such that the loss function value will decrease 12 Under what conditions will the algorithm you defined in the previous question converge to a globally optimal solution 13 Circle the Support Vectors in the following 2 D data What quantity is maximized to obtain these support vectors 5 What is the effect of maximizing this quantity Do support vector machines find a global maximum for this quantity 14 Below are plots of two data sets each having two classes and Which data set is linearly separable i e circle the linearly separable set x1 x1 x2 a x2 b 6 15 For Nonlinear Support Vector Machines input data is projected into what nonlinear space Is it the same space for Classification and Regression Support Vector Machines 16 Assume that I have the following support vector classification model y f x sgn 5 4 2 K 1 3 x 2 1 K 3 2 x note that 1 3 and 3 2 are 2 dimensional vectors From this model can you identify the number of training examples used to construct the model Can you give the xi yi values for any of the training examples used to construct this model 17 Define a kernel matrix and give pseudo code for evaluating it 18 What is the difference between classification and regression data 7 19 Can I use a regression algorithm to solve a classification problem If your answer is yes describe how it can be done If it is no describe why not 20 Can I use a classification algorithm to solve a regression problem If your answer is yes describe how it can be done If it is no describe why not 21 Define N Fold cross validation 8 22 If I use N Fold cross validation to select learning parameters is the error rate that is returned by the N Fold cross validation procedure a good indication of how well the model built by my algorithm will do on future data If not what is in other words how can I compare how well two different learning algorithms will do on a specific data set 23 What is an unstable learning algorithm Give two examples of unstable predictors Give one example of a stable predictor 24 How is Bagging different from Deterministic Boosting 9 25 If you are using single tree stumps as the base classifier in Deterministic Boosting could you build a classifier that separates the following data Assume that the stumps are constructed one at a time using a single variable and that the split variable chosen is based on minimizing the entropy after the split x1 x2 x2 y 0 0 0 1 0 0 …
View Full Document