CMU MLG 10601 - Regression - D1210861

Home> Schools> Carnegie Mellon University> Machine Learning (MLG) > MLG 10601> Regression

CMU MLG 10601 - Regression

Pages 39

Download Save

Unformatted text preview:

10 601 Machine Learning Regression Types of classifiers We can divide the large variety of classification approaches into three major types 1 Instance based classifiers Use observation directly no models e g K nearest neighbors 2 Generative build a generative statistical model e g Bayesian networks 3 Discriminative directly estimate a decision rule boundary e g decision tree Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Regressor Predict real no Today Choosing a restaurant In everyday life we need to make decisions by taking into account lots of factors Reviews out of 5 stars Distance Cuisine out of 10 The question is what weight we put on each of these factors how important are they with respect to the others 4 30 21 7 2 15 12 8 5 27 53 9 3 20 5 6 Assume we would like to build a recommender system based on an individuals preferences If we have many observations we may be able to recover the weights Linear regression Given an input x we would like to compute an output y For example Predict height from age Predict Google s price from Yahoo s price Predict distance from wall from sensors Y X Linear regression Given an input x we would like to compute an output y In linear regression we assume that y and x are related with the following equation Y Observed values What we are trying to predict y wx where w is a parameter and represents measurement or other noise X Linear regression Y Our goal is to estimate w from a training y wx data of xi yi pairs This could be done using a least squares approach arg min w yi wxi 2 i X Why least squares minimizes squared distance between measurements and predicted line has a nice probabilistic interpretation easy to compute If the noise is Gaussian with mean 0 then least squares is also the maximum likelihood estimate of w Solving linear regression You should be familiar with this by now We just take the derivative w r t to w and set to 0 yi wxi 2 2 xi yi wxi w i i 2 xi yi wxi 0 i 2 x y wx i i i i i x y w x i i 2 i i i Regression example Generated w 2 Recovered w 2 03 Noise std 1 Regression example Generated w 2 Recovered w 2 05 Noise std 2 Regression example Generated w 2 Recovered w 2 08 Noise std 4 Bias term So far we assumed that the line passes through the origin What if the line does not No problem simply change the model to y w0 w1x Y w0 Can use least squares to determine w0 w1 w0 y i X w1 xi i w1 i n x y w x i 0 i 2 i i Bias term So far we assumed that the line passes through the origin Y What if the line does not No problem simply change the model to y w0Just w1x a second we will soon give a simpler solution w0 Can use least squares to determine w0 w1 w0 y i X w1 xi i w1 i n x y w x i 0 i 2 i i Multivariate regression What if we have several inputs Stock prices for Yahoo Microsoft and Ebay for the Google prediction task This becomes a multivariate regression problem Again its easy to model y w0 w1x1 wkxk Google s stock price Microsoft s stock price Yahoo s stock price Multivariate regression What if we have several inputs Stock prices for Yahoo Microsoft and Ebay for Not all functions the Google prediction task can be approximated using the input This becomes a multivariate values directlyregression problem Again its easy to model y w0 w1x1 wkxk y 10 3x12 2x22 In some cases we would like to use polynomial or other terms based on the input data are these still linear regression problems Yes As long as the coefficients are linear the equation is still a linear regression problem Non Linear basis function So far we only used the observed values However linear regression can be applied in the same way to functions of these values As long as these functions can be directly computed from the observed values the parameters are still linear in the data and the problem remains a linear regression problem y w0 w1 x12 wk xk2 Non Linear basis function What type of functions can we use A few common examples Polynomial j x xj for j 0 n Gaussian j x Sigmoid x j 2 2j 1 j x 1 exp s j x Any function of the input values can be used The solution for the parameters of the regression remains the same General linear regression problem Using our new notations for the basis function linear regression can n be written as y w j j x j 0 Where j x can be either xj for multivariate regression or one of the non linear basis we defined Once again we can use least squares to find the optimal solution LMS for the general linear regression problem n y w j j x Our goal is to minimize the following loss function J w y i w j j x i i j 0 2 w vector of dimension k 1 xi vector of dimension k 1 yi a scaler j Moving to vector notations we get J w y w x i T i 2 i We take the derivative w r t w i T i 2 i T i i T y w x 2 y w x x w i i i T i i T Equating to 0 we get 2 y w x x 0 i i i T y x w x x i i i i T T LMS for general linear regression problem J w y i w T x i 2 We take the derivative w r t w y i w T x i 2 2 y i w T x i x i T w i i Equating to 0 we get 2 y w x x 0 i T i i T i i i T y x w x x i i i Define 0 x1 0 x 2 n 0 x Then deriving w get we 1 x1 1 x 2 1 x n i T T m x1 m x 2 m x n w T 1 T y i LMS for general linear regression problem J w y i w T x i 2 Deriving w we get 1 i w y T T n entries vector k 1 entries vector n by k 1 matrix This solution is also known as psuedo inverse Example Polynomial regression A probabilistic interpretation Our least squares minimization solution can also be motivated by a probabilistic in interpretation of the regression problem y w T x where the noise signals are independent the noise has a normal distribution with mean 0 and unknown variance 2 Then p y w x has a …

View Full Document


School:
Email:
New Password:
Confirm Password:

CMU MLG 10601 - Regression

Sign up for free to view:

Please select your school