DOC PREVIEW
CMU CS 10601 - Recitation

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Linear regressionLogistic regressionDecision trees10-601 Machine LearningMaterial from Ziv Bar-Joseph’s lecture slides and Christopher Bishop’s textbookLinear regressionLinear regression• Given an input x we would like to compute an output y• In linear regression we assume that y and x are related with the following equation: y = wx+where w is a parameter and represents measurement or other noise XYWhat we are trying to predictObserved values• Our goal is to estimate w from a training data of <xi,yi> pairs• This could be done using a least squares approach• Why least squares?- minimizes squared distance between measurements and predicted line- has a nice probabilistic interpretation- easy to computeLinear regressioniiiwwxy2)(minargXY wxyIf the noise is Gaussian with mean 0 then least squares is also the maximum likelihood estimate of wNon-Linear basis function• So far we only used the observed values• However, linear regression can be applied in the same way to functions of these values• As long as these functions can be directly computed from the observed values the parameters are still linear in the data and the problem remains a linear regression problem.• What type of functions can we use?Non-Linear basis function• What type of functions can we use?• A few common examples:- Polynomial: j(x) = xjfor j=0 … n- Gaussian: - Sigmoid: j(x) (x j)2j2j(x) 11 exp(sjx)Any function of the input values can be used. The solution for the parameters of the regression remains the same.General linear regression problem• Using our new notations for the basis function linear regression can be written as• Where j(x) can be either xj for multivariate regression or one of the non linear basis we defined• Once again we can use ‘least squares’ to find the optimal solution.y  wjj(x)j 0nLMS for the general linear regression problemy  wjj(x)j 0nJ(w)  (yi wjj(xi)j)i2Our goal is to minimize the following loss function:Moving to vector notations we get:We take the derivative w.r.t wJ(w)  (yiwT(xi))2iw(yiwT(xi))2i 2 (yiwT(xi))i(xi)TEquating to 0 we get2 (yi wT(xi))i(xi)T 0 yii(xi)T wT(xi)i(xi)TLMS for general linear regression problemWe take the derivative w.r.t wJ(w)  (yiwT(xi))2iw(yiwT(xi))2i 2 (yiwT(xi))i(xi)TEquating to 0 we get2 (yi wT(xi))i(xi)T 0 yii(xi)T wT(xi)i(xi)TDefine: 0(x1)1(x1)m(x1)0(x2)1(x2)m(x2)0(xn)1(xn)m(xn)Then deriving w we get:w  (T)1TyOther types of linear regression• Linear regression is a useful model for many problems • However, the parameters we learn for this model are global; they are the same regardless of the value of the input x• Extension to linear regression adjust their parameters based on the region of the input we are dealing withSplines • Instead of fitting one function for the entire region, fit a set of piecewise (usually cubic) polynomials satisfying continuity and smoothness constraints.• Results in smooth and flexible functions without too many parameters• Need to define the regions in advance (usually uniform)y  a1x3 b1x2 c1x  d1y  a2x3 b2x2 c2x  d2y  a3x3 b3x2 c3x  d3Splines • The polynomials are not independent• For cubic splines we require that they agree in the border point on the value, the values of the first derivative and the value of the second derivative• How many free parameters do we actually have?y  a1x3 b1x2 c1x  d1y  a2x3 b2x2 c2x  d2y  a3x3 b3x2 c3x  d3Splines • Splines sometimes contain additional requirements for the first and last polynomial (for example, having them start at 0)• Once Splines are fitted to the data they can be used to predict new values in the same way as regular linear regression, though they are limited to the support regions for which they have been defined• Note the range of functions that can be displayed with relatively small number of polynomials (in the example I am using 5)Locally weighted models• Splines rely on a fixed region for each polynomial and the weight of all points within the region is the same.• An alternative option is to set the region based on the density of the input data and have points closer to the point we are trying to estimate have a higher weightWeighted regression• For a point x we use weight function x centered at x to assign weight to points in x’s vicinity• Next we solve the following weighted regression problem• The solution is the same as our general solution (the weight is given for every input)minwx(xi)(yiiwT(xi))2x(x1)=0.3x1x2xx(x)=0.9x(x2)=0.7Determining the weights• There are a number of ways to determine the weights• One option is to use a Gaussian centered at x, such that2is a parameter that should be selected by the usereixxixx222)(21)(More on these weights when we discuss kernelsBayesian linear regression• Frequentist setting– Use MLE to calculate a single estimate of the weights as seen previously• Bayesian setting– Calculate the posterior distribution of the weightsBayesian linear regressionImage from Bishopxwwy10No observations1 observation2 observations20 observationsAs we observe more data, our estimate of the weights becomes more sharply peakedLogistic regressionThe sigmoid function• To classify using regression models we replace the linear function with the sigmoid function:• Using the sigmoid we set (for binary classification problems)g(h) 11 ehp(y | x;)Always between 0 and 1p(y  0 | x;)  g(wTx) 11 ewTxp(y 1| x;) 1 g(wTx) ewTx1 ewTxRegularization• Like with other data estimation problems, we may not have enough data to learn good models• One way to overcome this is to ‘regularize’ the model, impose additional constraints on the parameters we are fitting.• For example, lets assume that wicomes from a Guassiandistribution with mean


View Full Document

CMU CS 10601 - Recitation

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Recitation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Recitation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Recitation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?