DOC PREVIEW
Princeton COS 424 - Linear Regression

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

COS 424: Interaction with DataLecturer: David Blei Lecture # 15Scribe: Xiaobai Chen, Jialu Huang April. 1, 2008Linear Regression (1)What is regression ?Regression is the problem to predict a real-valued variable from input dataAn example belowSo now the question is how to find a linear function so that whenever given a x, we can predict the value of y:1Other examples that linear regression may be applied to include:1. How tall with respect to how much the weight2. to guess the score between two basketball teams, given any statistics about the teams3. How much people earn after graduation Vs. his/her grade at school4. How much do people spend on their cars according to their salaryWe talk about multi-inputs (“x” is a vector now, and each element represents a different feature)The response is assumed to be a linear function of the input:One thing we need to point out here is that linear regression has much more flexibility than you might imagine: We may use any function of x as input, likex2,x3, etc. In a word, its simplicity and flexibility make linear regression one of the most important and widely used statistical prediction techniques. Of course, we also have problems which cannot be solved by linear regression. Here we give an example for polynomial regression: 2is a hyperplane as shown in the following graph:If we use linear regression to plot a function, we might get result like:which obviously does not fit to the data points at all. But if we use polynomial regression, we can get result like:3which fits all the data points much better.So now we start considering how to fit a regression: now we want to find the coefficient that can predict ynew from xnew.We start from a simple case: a single input feature, and 0= 0 in this stage: A reasonable approach is to minimize sum of squared Euclidean distance between each prediction and truth, so we get our objective function for optimization :The derivation of:The optimal value is:According this optimal value, we can plot a line like the following:4 Give dataBased on this line, when we need to predict a new output from a new input, we just use the point on the line at that input:Now we want to move on to multiple inputs: To simplify, letbe a p+1 vector and set xp+1 = 1. Now the RSS is: The derivative with respect to iis:As a vector, the gradient is:First, we define: the design matrix is an N x (p+1) matrix X, the response vector is an N-vector y, and the parameter vector is a (p+1)-vector , then the gradient of 5In general:the RSS is: Setting to the 0-vector and solving for, we get: Question: What is the probabilistic interpretation for linear regression?Linear regression assumes that the output are drawn from a Normal distribution whose mean is a linear function of the coefficients and the input:This is like putting a Gaussian “bump” around the mean, which is a linear function of the input. We find the parameter vector β that maximizes the conditional likelihood. The Then we maximize the conditional log likelihood with respect to:which is the same as minimizing the residual sum of squaresSo the maximum likelihood estimates are identical to the estimates we obtained earlier. We are actually estimating the conditional expectation of y given x6This works as long as is invertible, i.e., X is full rank.conditional log likelihood of datais:A real-world example is given the eruption time, we try to predict the waiting time to the next eruption: Important asides:Bias-Variance trade offConsider a random data set that is drawn from a linear regression model. We can contemplate the maximum likelihood estimateβas a random variable whose distribution is governed by the distribution of the data set. Suppose we observe a new data input x, we can consider the mean squared error of our estimate of Hereis not random andβis random, so:As a result, we can get MSE:The second term is the squared bias:An estimate for which this term is zero is an unbiased estimate.The first term is the variance:This reflect s how sensitive the estimate is to the randomness inherent in the


View Full Document

Princeton COS 424 - Linear Regression

Download Linear Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Linear Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Linear Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?