Prediction concerning the response YSimple linear regression modelSlide 3Three different research questionsExample: Skin cancer mortality and latitudeSlide 6“Point estimators”It is dangerous to “extrapolate” beyond scope of model.Slide 9Confidence interval for the population mean response E(Yh)Again, what are we estimating?(1-α)100% t-interval for mean response E(Yh)Slide 13Implications on precisionSlide 15Comments on assumptionsPrediction interval for a new response Yh(new)Again, what are we predicting?(1-α)100% prediction interval for new response Yh(new)Prediction of Yh(new) if mean E(Y) is knownSlide 21Prediction of Yh(new) if mean E(Y) is not knownSummary of prediction issuesVariation of the predictionSlide 25Confidence intervals and prediction intervals for response in MinitabSlide 27Slide 28A plot of the confidence interval and prediction interval in MinitabSlide 30Prediction concerning the response YSimple linear regression model54321221814106High school gpaCollege entrance test score xYEY 10 iixY10Simple linear regression modelThree different research questions•What is the mean response, E(Yh), for a given value, xh, of the predictor variable?•What would one predict a new observation , Yh(new), to be for a given value, xh, of the predictor variable?•What would one predict the mean of m new observations, , to be for a given value, xh , of the predictor variable?)(newhYExample: Skin cancer mortality and latitude•What is the expected (mean) mortality rate for all locations at 40o N latitude?•What is the predicted mortality rate for 1 new randomly selected location at 40o N?•What is the predicted mortality rate for 10 new randomly selected locations at 40o N?504030200150100LatitudeMortalityS = 19.1150 R-Sq = 68.0 % R-Sq(adj) = 67.3 %Mortality = 389.189 - 5.97764 LatitudeRegression PlotExample: Skin cancer mortality and latitude“Point estimators”is the best point estimator in each case.hhxbbY10ˆThat is, it is:• the best guess of the mean response at xh• the best guess of a new observation at xh• the best guess of a mean of m new observations at xhBut, as always, to be confident in the answer to our research question, we should put an interval around our best guess.It is dangerous to “extrapolate” beyond scope of model.654321030252015conccoloniesS = 2.67546 R-Sq = 66.8 % R-Sq(adj) = 63.5 %colonies = 16.0667 + 1.61576 concRegression PlotIt is dangerous to “extrapolate” beyond scope of model.10 5 0302010conccoloniesS = 2.74819 R-Sq = 69.6 % R-Sq(adj) = 64.5 % - 0.276956 conc**2colonies = 15.0205 + 3.22113 concRegression PlotConfidence interval for the population mean response E(Yh)Again, what are we estimating?54321221814106High school gpaCollege entrance test score xYEY 10 iixY10(1-α)100% t-interval for mean response E(Yh)Formula in notation:Formula in words:Sample estimate ± (t-multiplier × standard error) 222,211ˆxxxxnMSEtyihnhImplications on precision•The greater the spread in the xi values, the narrower the confidence interval, the more precise the prediction of E (Yh).•Given the same set of xi values, the further xh is from the (sample) mean of the xi, the wider the confidence interval, the less precise the prediction of E (Yh).Predicted Values for New ObservationsNew Fit SE Fit 95.0% CI 95.0% PI1 150.08 2.75 (144.6,155.6) (111.2,188.93) 2 221.82 7.42 (206.9,236.8) (180.6,263.07)X X denotes a row with X values away from the centerValues of Predictors for New ObservationsNew Obs Latitude1 40.02 28.0 Mean of Lat = 39.533Comments on assumptions•xh is a value within scope of model, but it is not necessary that it is one of the x values in the data set.•The confidence interval formula for E(Yh) works okay even if the error terms are only approximately normally distributed.•If you have a large sample, the error terms can even deviate substantially from normality without greatly affecting appropriateness of the confidence interval.Prediction interval for a new response Yh(new)Again, what are we predicting?54321221814106High school gpaCollege entrance test score xYEY 10 iixY10(1-α)100% prediction interval for new response Yh(new)Formula in notation:Formula in words:Sample prediction ± (t-multiplier × standard error) 222,2111ˆxxxxnMSEtyihnhPrediction of Yh(new) if mean E(Y) is knownAssume2525soPrediction of Yh(new) if mean E(Y) is known47 52 576267 72 770.000.010.020.030.040.050.060.070.08Number of hoursNormal curve0.997Prediction of Yh(new) if mean E(Y) is not knownSummary of prediction issues•We cannot be certain of the mean of the distribution of Y. •Prediction limits for Yh(new) must take into account:–variation in the possible mean of the distribution of Y–variation in the responses Y within the probability distributionVariation of the prediction)ˆ(22hY niihniihxxxxnMSExxxxnMSEMSE122122111which is estimated by:The variation in the prediction of a new response depends on two components:1. the variation due to estimating the mean E(Yh) with2. the variation in Y within the probability distributionhyˆ(1-α)100% prediction interval for new response Yh(new)Formula in notation:Formula in words:Sample prediction ± (t-multiplier × standard error) 222,2111ˆxxxxnMSEtyihnhConfidence intervals and prediction intervals for response in Minitab•Stat >> Regression >> Regression …•Specify response and predictor(s).•Select Options… –In “Prediction intervals for new observations” box, specify either the X value or a column name containing multiple X values. –Specify confidence level (default is 95%).•Click on OK. Click on OK.•Results appear in session window.Predicted Values for New ObservationsNew Fit SE Fit 95.0% CI 95.0% PI1 150.08 2.75 (144.6,155.6) (111.2,188.93) 2 221.82 7.42 (206.9,236.8) (180.6,263.07)X X denotes a row with X values away from the centerValues of Predictors for New
View Full Document