CMU CS 10601 - Recitation - D2185071

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> Recitation

DOC PREVIEW

CMU CS 10601 - Recitation

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 47

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Linear regressionLogistic regressionDecision trees10-601 Machine LearningMaterial from Ziv Bar-Joseph’s lecture slides and Christopher Bishop’s textbookLinear regressionLinear regression• Given an input x we would like to compute an output y• In linear regression we assume that y and x are related with the following equation: y = wx+where w is a parameter and represents measurement or other noise XYWhat we are trying to predictObserved values• Our goal is to estimate w from a training data of <xi,yi> pairs• This could be done using a least squares approach• Why least squares?- minimizes squared distance between measurements and predicted line- has a nice probabilistic interpretation- easy to computeLinear regressioniiiwwxy2)(minargXY wxyIf the noise is Gaussian with mean 0 then least squares is also the maximum likelihood estimate of wNon-Linear basis function• So far we only used the observed values• However, linear regression can be applied in the same way to functions of these values• As long as these functions can be directly computed from the observed values the parameters are still linear in the data and the problem remains a linear regression problem.• What type of functions can we use?Non-Linear basis function• What type of functions can we use?• A few common examples:- Polynomial: j(x) = xjfor j=0 … n- Gaussian: - Sigmoid: j(x) (x j)2j2j(x) 11 exp(sjx)Any function of the input values can be used. The solution for the parameters of the regression remains the same.General linear regression problem• Using our new notations for the basis function linear regression can be written as• Where j(x) can be either xj for multivariate regression or one of the non linear basis we defined• Once again we can use ‘least squares’ to find the optimal solution.y  wjj(x)j 0nLMS for the general linear regression problemy  wjj(x)j 0nJ(w)  (yi wjj(xi)j)i2Our goal is to minimize the following loss function:Moving to vector notations we get:We take the derivative w.r.t wJ(w)  (yiwT(xi))2iw(yiwT(xi))2i 2 (yiwT(xi))i(xi)TEquating to 0 we get2 (yi wT(xi))i(xi)T 0 yii(xi)T wT(xi)i(xi)TLMS for general linear regression problemWe take the derivative w.r.t wJ(w)  (yiwT(xi))2iw(yiwT(xi))2i 2 (yiwT(xi))i(xi)TEquating to 0 we get2 (yi wT(xi))i(xi)T 0 yii(xi)T wT(xi)i(xi)TDefine: 0(x1)1(x1)m(x1)0(x2)1(x2)m(x2)0(xn)1(xn)m(xn)Then deriving w we get:w  (T)1TyOther types of linear regression• Linear regression is a useful model for many problems • However, the parameters we learn for this model are global; they are the same regardless of the value of the input x• Extension to linear regression adjust their parameters based on the region of the input we are dealing withSplines • Instead of fitting one function for the entire region, fit a set of piecewise (usually cubic) polynomials satisfying continuity and smoothness constraints.• Results in smooth and flexible functions without too many parameters• Need to define the regions in advance (usually uniform)y  a1x3 b1x2 c1x  d1y  a2x3 b2x2 c2x  d2y  a3x3 b3x2 c3x  d3Splines • The polynomials are not independent• For cubic splines we require that they agree in the border point on the value, the values of the first derivative and the value of the second derivative• How many free parameters do we actually have?y  a1x3 b1x2 c1x  d1y  a2x3 b2x2 c2x  d2y  a3x3 b3x2 c3x  d3Splines • Splines sometimes contain additional requirements for the first and last polynomial (for example, having them start at 0)• Once Splines are fitted to the data they can be used to predict new values in the same way as regular linear regression, though they are limited to the support regions for which they have been defined• Note the range of functions that can be displayed with relatively small number of polynomials (in the example I am using 5)Locally weighted models• Splines rely on a fixed region for each polynomial and the weight of all points within the region is the same.• An alternative option is to set the region based on the density of the input data and have points closer to the point we are trying to estimate have a higher weightWeighted regression• For a point x we use weight function x centered at x to assign weight to points in x’s vicinity• Next we solve the following weighted regression problem• The solution is the same as our general solution (the weight is given for every input)minwx(xi)(yiiwT(xi))2x(x1)=0.3x1x2xx(x)=0.9x(x2)=0.7Determining the weights• There are a number of ways to determine the weights• One option is to use a Gaussian centered at x, such that2is a parameter that should be selected by the usereixxixx222)(21)(More on these weights when we discuss kernelsBayesian linear regression• Frequentist setting– Use MLE to calculate a single estimate of the weights as seen previously• Bayesian setting– Calculate the posterior distribution of the weightsBayesian linear regressionImage from Bishopxwwy10No observations1 observation2 observations20 observationsAs we observe more data, our estimate of the weights becomes more sharply peakedLogistic regressionThe sigmoid function• To classify using regression models we replace the linear function with the sigmoid function:• Using the sigmoid we set (for binary classification problems)g(h) 11 ehp(y | x;)Always between 0 and 1p(y  0 | x;)  g(wTx) 11 ewTxp(y 1| x;) 1 g(wTx) ewTx1 ewTxRegularization• Like with other data estimation problems, we may not have enough data to learn good models• One way to overcome this is to ‘regularize’ the model, impose additional constraints on the parameters we are fitting.• For example, lets assume that wicomes from a Guassiandistribution with mean

View Full Document