15-381: Artificial IntelligenceRegression and cross validationWhere we areInputsClassifierPredictcategoryInputsDensityEstimatorProb-abilityInputsRegressorPredictreal no.√Today√Linear regression• Given an input x wewould like to compute anoutput y• For example: - Predict height from age - Predict Google’s pricefrom Yahoo’s price - Predict distance fromwall from sensorsXYLinear regression• Given an input x we would like tocompute an output y• In linear regression we assumethat y and x are related with thefollowing equation: y = wx+ε where w is a parameter and εrepresents measurement orother noiseXYWhat we aretrying to predictObserved values• Our goal is to estimate w from a trainingdata of <xi,yi> pairs• This could be done using a least squaresapproach• Why least squares? - minimizes squared distance betweenmeasurements and predicted line - has a nice probabilistic interpretation - easy to computeLinear regression!"iiiwwxy2)(minargXY!+= wxyIf the noise is Gaussianwith mean 0 then leastsquares is also themaximum likelihoodestimate of wSolving linear regression• You should be familiar with this by now …• We just take the derivative w.r.t. to w and set to 0:!!! !!!!="="=#"##=#$$iiiiii iiiiiiiiiiiiiiixyxwwxyxwxyxwxyxwxyw2220)(2)(2)(Regression example• Generated: w=2• Recovered: w=2.03• Noise: std=1Regression example• Generated: w=2• Recovered: w=2.05• Noise: std=2Regression example• Generated: w=2• Recovered: w=2.08• Noise: std=4Affine regression• So far we assumed that theline passes through the origin• What if the line does not?• No problem, simply change themodel to y = w0 + w1x+ε• Can use least squares todetermine w0 , w1nxwywiii!"=10XYw0!!"=iiiiixwyxw201)(Affine regression• So far we assumed that theline passes through the origin• What if the line does not?• No problem, simply change themodel to y = w0 + w1x+ε• Can use least squares todetermine w0 , w1nxwywiii!"=10XYw0!!"=iiiiixwyxw201)(Just a second, we will soongive a simpler solutionMultivariate regression• What if we have several inputs? - Stock prices for Yahoo, Microsoft and Ebay forthe Google prediction task• This becomes a multivariate regression problem• Again, its easy to model: y = w0 + w1x1+ … + wkxk + εNotations:Lower case: variable or parameter (w0)Lower case bold: vector (w)Upper case bold: matrix (X)Multivariate regression: Leastsquares• We are now interested in a vector wT = [w0, w1 ,… , wk]• It would be useful to represent this in matrix notations: ! X =X1MXn" # $ $ $ % & ' ' ' =1 x11L x1k1 x21L x2kM M L M1 xn1L xnk" # $ $ $ $ % & ' ' ' ' !!!!"#$$$$%&=nyyyM21y• We can thus re-write our model as y = Xw+ε• The solution turns out to be: w = (XTX)-1XTy• This is an instance of a larger set of computational solutions whichare usually referred to as ‘generalized least squares’Multivariate regression: Leastsquares• We can re-write our model as y = Xw• The solution turns out to be: w = (XTX)-1XTy• The is an instance of a larger set of computational solutions whichare usually referred to as ‘generalized least squares’• XTX is a k by k matrix• XTy is a vector with k entriesWhy is (XTX)-1XTy the right solution?Hint: Multiply both sides of the original equation by (XTX)-1XT• Can also generalize these classes of functions to benon-linear functions of the inputs x but still linear in theparameters w.Beyond linear regression ! f (x,w) = w0+ w1x + w2x2+ L + wmxmPolynomial regression examplesOver fitting• With too few training examples our polynomialregression model may achieve zero training error butnevertheless has a large generalization error• When the training error no longer bears any relationto the generalization error we say that the functionoverfits the (training) data0)),;((0)),;((1210~),(1210>>!"!#=wwxfyEwwxfynPyxniii• Cross-validation allows us to estimate the generalizationerror based on training examples alone.• We learn a model using a subset of the training data andestimate the generalization error using the rest of the data• We chose the model (for example polynomial order) thatminimizes the error on the held out dataCross validationCommon strategies - Leave one out cross validation - Leave a bigger subset - Train and test setsCross validation:
View Full Document