STEVENS MA 331 - Lecture 8 Simple Linear Regression

Unformatted text preview:

Lecture 8Section 10.1. Objectives:Settings of Simple Linear Regression The statistical model for simple linear regression: ANOVA: groups with same SD and different means:Linear regression: many groups with means depending linearly on quantitative xExample: 10.1 page 636Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Verifying the Conditions for inference:Slide Number 16Slide Number 17Confidence interval for regression parametersSignificance test for the slopeTesting the hypothesis of no relationshipSlide Number 21Exercise: Calculate (manually) confidence intervals for the mean increase in gas consumption with every unit of (logmph) increase. Compare with software.Confidence interval for µySlide Number 24Inference for predictionSlide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Lecture 8Simple Linear Regression(cont.)Section 10.1. Objectives:9Statistical model for linear regression9Data for simple linear regression9Estimation of the parameters9Confidence intervals and significance tests9Confidence intervals for mean responsevs.9Prediction intervals (for future observation)Settings of Simple Linear Regression9 Now we will think of the least squares regression line computed from the sample as an estimate of the true regression line for the population. 9 Different Notations than Ch. 2.Think b0=a, b1=b.Type of line Least Squares Regression equation of line slope y-intercept Ch. 2 General ˆyabx=+ b a Ch. 10 Sample ˆ01ybbxμ=+ b1 b0 Ch. 10 Population 01yxμββ=+ β1 β0The statistical model for simple linear regression:z Data: n observations in the form (x1, y1), (x2, y2), … (xn, yn).z The deviations are assumed to be independent and normally distributed with mean 0 and constant standard deviation σ. z The parameters of the model are: β0, β1, and σ.01iiiyxββε=++iεANOVA: groups with same SD and different means:Linear regression: many groups with means depending linearly on quantitative xExample: 10.1 page 636z See R code.Verifying the Conditions for inference:z Look to the errors. They are supposed to be: -independent, normal and have the same variance. z The errors are estimated using residuals: (y − ŷ)Residual plot:The spread of the residuals is reasonably random—no clear pattern. The relationship is indeed linear. But we see one low residual (3.8, −4) and one potentially influential point (2.5, 0.5). Normal quantile plot for residuals:The plot is fairly straight, supporting the assumption of normally distributed residuals.Î Data okay for inference.Residuals are randomly scattered ÆCurved pattern Æ the relationship is not linear.Change in variability across plotÆ σ not equal for all values of x.CONFIDENCE INTERVAL FOR REGRESSION PARAMETERSEstimating the regression parameters β0, β1is a case of one-sample inference with unknown population variance. Î We rely on the t distribution, with n – 2 degrees of freedom.A level C confidence interval for the slope, β1, is proportional to the standard error of the least-squares slope:b1 ± t* SEb1A level C confidence interval for the intercept, β0 , is proportional to the standard error of the least-squares intercept:b0 ± t* SEb0t* is the critical value for the t (n – 2)distribution with area C between –t* and +t*.Significance test for the slopeWe can test the hypothesis H0: β1 = 0 versus a 1 or 2 sided alternative.We calculate t = b1/ SEb1which has the t (n –2) distribution to find the p-value of the test.Note: Software typically providestwo-sided p-values.Testing the hypothesis of no relationshipWe may look for evidence of a significant relationship between variables x and y in the population from which our data were drawn.For that, we can test the hypothesis that the regression slope parameter β is equal to zero.H0: β1 = 0 vs. H0: β1 ≠ 0Testing H0: β1= 0 also allows to test the hypothesis of no correlation between x and y in the population. Note: A test of hypothesis for β0is irrelevant (β0is often not even achievable).1slope yxsbrs=Using technologyComputer software runs all the computations for regression analysis. Here is software output for the car speed/gas efficiency example.p-values for tests of significanceThe t-test for regression slope is highly significant (p < 0.001). There is a significant relationship between average car speed and gas efficiency.To obtain confidence intervals use the function confint()SlopeInterceptExercise: Calculate (manually) confidence intervals for the mean increase in gas consumption with every unit of (logmph) increase. Compare with software.z confint(model.2_logmodel)z 2.5 % 97.5 %z LOGMPH 7.165435 8.583055Confidence interval for µyUsing inference, we can also calculate a confidence interval for the population mean μyof all responses y when x takes the value x* (within the range of data tested):This interval is centered on ŷ, the unbiased estimate of μy.The true value of the population mean μyat a givenvalue of x, will indeed be within our confidenceinterval in C% of all intervals calculated from many different random samples.The level C confidence interval for the mean response μyat a given value x* of x is centered on ŷ (unbiased estimate of μy):ŷ ± tn − 2 * SEμ^A separate confidence interval is calculated for μyalong all the values that x takes. Graphically, the series of confidence intervals is shown as a continuous interval on either side of ŷ.t* is the t critical for the t (n – 2)distribution with area C between –t* and +t*.95% confidence interval for μyInference for predictionOne use of regression is for predicting the value of y, ŷ, for any value of x within the range of data tested: ŷ = b0+ b1x.But the regression equation depends on the particular sample drawn. More reliable predictions require statistical inference:To estimate an individual response y for a given value of x, we use a prediction interval.If we randomly sampled many times, there would be many different values of yobtained for a particular x following N(0, σ) around the mean response µy.The level C prediction interval for a single observation on y when xtakes the value x* is:C ± t*n − 2 SEŷThe prediction interval represents mainly the error from the normal distribution of the residuals εi.Graphically, the series confidence intervals is shown as a continuous interval on either side of ŷ.95% prediction interval for ŷt* is the t critical for


View Full Document

STEVENS MA 331 - Lecture 8 Simple Linear Regression

Download Lecture 8 Simple Linear Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 8 Simple Linear Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 8 Simple Linear Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?