Assignment I Linear Regression Generating Synthetic Data This assignment shows how we can extend ordinary least squares regression which uses the hypothesis class of linear regression functions to non linear regression functions modeled using polynomial basis functions and ytrue This radial basis functions The function we want to fit is is a univariate function as it has only one input variable First we generate synthetic input data sampling points from a uniform distribution on the interval x 6 sin x 2 sin 2x 4 7 5 7 5 n 750 ftrue xi by In The true function def f true x y 6 0 np sin x 2 np sin 2 x 4 return y We can generate a synthetic data set with Gaussian noise In import numpy as np For all our math needs n 750 Number of data points X np random uniform 7 5 7 5 n Training examples in one dimension e np random normal 0 0 5 0 n Random Gaussian noise y f true X e True labels with noise Now we plot the raw data as well as the true function without noise In import matplotlib pyplot as plt For all our plotting needs plt figure Plot the data plt scatter X y 12 marker o Plot the true function which is really unknown x true np arange 7 5 7 5 0 05 y true f true x true plt plot x true y true marker None color r Recall that we want to build a model to generalize well on future data and in order to generalize well on future data we need to pick a model that trade off well between fit and complexity that is bias and variance We randomly split the overall data set into three subsets D consists of the actual training examples that will be used to train the model Dval consists of validation examples that will be used to tune model hyperparameters in ridge regression in order to find the best trade off between fit and complexity that is the Dtrn Training set Validation set 0 such as that produces the best model value of Dtst Test set consists of test examples to estimate how the model will perform on future data For this example let us randomly partition the data into three non intersecting sets D Dval 30 10 Dtst and of of D Dtrn 60 of D In scikit learn has many tools and utilities for model selection from sklearn model selection import train test split tst frac 0 3 Fraction of examples to sample for the test set val frac 0 1 Fraction of examples to sample for the validation set First we use train test split to partition X y into training and test se ts X trn X tst y trn y tst train test split X y test size tst frac random state 42 Next we use train test split to further partition X trn y trn into train ing and validation sets X trn X val y trn y val train test split X trn y trn test size val frac random state 42 Plot the three subsets plt figure plt scatter X trn y trn 12 marker o color orange plt scatter X val y val 12 marker o color green plt scatter X tst y tst 12 marker o color blue 1 Regression with Polynomial Basis Functions 30 points This problem extends ordinary least squares regression which uses the hypothesis class of linear regression functions to non linear regression functions modeled using polynomial basis functions In order to learn nonlinear models using linear regression we have to explicitly transform the data into a higher dimensional space The nonlinear hypothesis class we will consider is the set of degree polynomials of the form f x x d w0 w1 w2x2 wd xd or a linear combination of polynomial basis function 1 x x2 w0 w1 w2 wd T f x xd 1 x The monomials corresponding weight xd x2 are called basis functions and each basis function wk associated with it for all d k 1 d xi 1 We transform each univariate data point When this transformation has a xk xi xi x2 i xd i into into a multivariate dimensional data point via is applied to every data point it produces the Vandermonde matrix 1 1 1 x1 x2 xn x2 1 x2 2 x2 n xd 1 xd 2 xd n a 10 points Complete the Python function below that takes univariate data as input and computes a Vandermonde matrix of dimension This transforms one dimensional data into dimensional data in terms of the polynomial basis and allows us to model regression using a degree polynomial d d d In X float n univariate data d int degree of polynomial def polynomial transform X d Insert your code here b 10 points Complete the Python function below that takes a Vandermonde matrix weights via ordinary least squares regression Specifically given a Vandermonde matrix computation of performs element wise multiplication Alternately numpy dot https docs scipy org doc numpy 1 15 0 reference generated numpy dot html also performs matrix multiplication w and the labels 1 T T y Remember that in Python performs matrix multiplication while as input and learns implement the y In Phi float n d transformed data y float n labels def train model Phi y Insert your code here c 5 points Complete the Python function below that takes a Vandermonde matrix regression model n MSE w as input and evaluates the model using mean squared error That is i 1 yi wT i 2 corresponding labels 1 n y and a linear In Phi float n d transformed data y float n labels w float d linear regression model def evaluate model Phi y w Insert your code here d 5 points Discussion to steadily increase the non linearity of We can explore the effect of complexity by varying whose dimension increases and the models For each model we train using the transformed training data evaluate its performance on the transformed validation data and estimate what our future accuracy will be using the test data d 3 6 9 24 From plot of vs validation error below which choice of do you expect will generalize best d d In w Dictionary to store all the trained models validationErr Validation error of the models testErr Test error of all the models for d in range 3 25 3 Iterate over polynomial degree Phi trn polynomial transform X trn d Transform train ing data into d dimensions w d train model Phi trn y trn Learn model on training data Phi val polynomial transform X val d Transform valid ation data into d dimensions validationErr d evaluate model Phi val y val w d Evaluate model on validation data Phi tst polynomial transform X tst d Transform test data i testErr d evaluate model Phi tst y tst w d Evaluate model on tes nto d dimensions t data Plot all the models plt figure plt plot validationErr keys validationErr values marker o linewidth 3 markersize 12 plt plot testErr keys …
View Full Document