DOC PREVIEW
Duke STA 101 - Regression revisited

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Regression revisitedStatistical modelingLinear regressionSlide 4Regression terminologySome notationSample regression lineMotivating exampleSlide 9Slide 10Slide 11Mathematics of regression modelThe mechanics of regressionEstimating intercept and slopeRoot mean square error (RMSE)JMP outputResiduals are used to compute RMSESignificant tests and CIsCI for slopeCI slopeCI of slopeHypothesis test for existence of linear relationshipSlide 23Hypothesis testSlide 25How well does regression model fit data?Check the regression fit to the dataDiagnosing residual plotsPossible patterns in Residual PlotsResidual plotOne number summary of regression fitInterpretation in tree exampleCaution about R2Predictions from regressionRecall warningsFPP 11 and 12 and a little moreRegression revisitedStatistical modelingOften researchers seek to explain or predict one variable from others.In most contexts, it is impossible to do this perfectly: too much we don’t know.Use mathematical models that describe relationships as best we can.Incorporate chance error into models so that we can incorporate uncertainty in our explanations/predictionsLinear regressionLinear regression is probably the most common statistical modelIdea is like regression lines from Chapter 10. But, the slope and intercept from a regression line are estimates of that true line (just like a sample mean is an estimate of a population mean).Hence, we can make inference (confidence intervals and hypothesis tests) for the true slope and true intreceptLinear regressionOften relationships are described reasonably well by a linear trend.Linear regression allows us to estimate these trendsPlan of attackPose regression model and investigate assumptionsEstimate regression parameters from dataUse hypothesis testing and confidence interval ideas to determine if the relationship between two variables has occurred by chance aloneRegression with multiple predictorsRegression terminologyTypically, we label the outcome variable as Y and the predictor as X .Synonyms for outcome variables:response variable, dependent variablesSynonyms for predictor variablesexplanatory variables, independent variables, covariatesSome notationRecall the regression line or least squares line notation from earlier in the classα denotes the population interceptβdenotes the population slope € y = α + βxSample regression lineIf we collect a sample from some population and use sample values to calculate a regression line, then there is uncertainty associated with the sample slope and intercept estimates. The following notation is used to denote the sample regression line€ ˆ y = a + bxMotivating exampleA forest service official needs to determine the total volume of lumber on a piece of forest landAny ideas on how she might do this?Motivating exampleA forest service official needs to determine the total volume of lumber on a piece of forest landAny ideas on how she might do this that doesn’t require cutting down lots of trees?She hopes predicting volume of wood from tree diameter for individual trees will help determine total volume for the piece of forest land. She investigates, “Can the volume of wood for a tree be predicted by its diameter?”Motivating exampleFirst she randomly samples 31 trees and measures the diameter of each tree and then its volume.Then she constructs a scatter plot of the data collected and checks for a linear patternIs relationship linear?We know how to estimate the slope and intercept of the line that “best” fits the dataButMotivating exampleWhat would happen if the forest service agent took another sample of 31 trees?Would the slope change?Would the intercept change?What about a third sample of 31 trees?a and b are statistics and are dependent on a sampleWe know how to compute themThey are also estimates of a population intercept and slopeMathematics of regression modelTo accommodate the added uncertainty associated with the regression line we add one more term to the modelThis model specification has three assumptions1. the average value of Y for each X falls on line2. the deviations don’t depend on X3. the deviations from the straight line follow a normal curve 4. all units are independent€ yi= α + βxi+ εi, where εi comes from N(0,σε)The mechanics of regressionQuestions we aim to answerHow do we perform statistical inference on the intercept and slope of the regression line?What is a typical deviation from the regression line?How do we know the regression line explains the data well?Estimating intercept and slopeFrom early in the semester recall that the intercept and slope estimates for the line of “best” fit are € b = rSDySDx= 0.9716.463.14 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟= 5.07a = y − bx = 30.14 − 5.08(13.25) = −37.02ˆ y = −37.02 + 5.07xRoot mean square error (RMSE)What is the typical deviation from the regression line for a given x?The typical deviation is denoted by The root mean square error (RMSE) is a measure of the typical deviation from the regression line for a given xFor the trees data this is 4.28A tree with a diameter of 15 inches can be expected to have a volume of -37.02 + 5.07(15) = 39.03 cubic inches give or take about 4.28 cubic inches € σεJMP outputResiduals are used to compute RMSEThe deviation of each yi from the line is called a residual that we will denote by diAn estimate of that is used in most software packages is denoted by € di= yi−ˆ y i= yi− (a + bxi)€ sε=1n − 2di2i=1n∑€ σεSignificant tests and CIsGoing back to the example of trees sampled from the plot of land.The sampled trees are one possible random sample from all trees in the plot of landQuestions:What is a likely range for the population regression slope?Does the sample regression slope provide enough evidence to say with conviction that the population slope doesn’t equal zero?Why zero?CI for slopeEst. ± multiplier*SESame old friend in a new hatWe will use the sample slope as an estimateThe multiplier is found from a t-distribution with (n-2) degress of freedomThe SE of the slope (not to be confused with RMSE) is€ SEb=sε(xi− x )2i=1n∑CI slopeA 95% confidence interval for the population slope between diameter of tree and volume isb  multiplier*SEb5.07 


View Full Document

Duke STA 101 - Regression revisited

Download Regression revisited
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regression revisited and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regression revisited 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?