DOC PREVIEW
UVA STAT 2120 - Topic_11

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Inference for RegressionInference for RegressionInference about the Regression Model and Ui h R i Li ihD ilUsing the Regression Line, with DetailsSection 10.1, 2, 3Basic components of regression setup Target of inference: linear dependency of a response variable on one or more explanatory variablespy One explanatory variable ⇒ simple linear regression (SLR) One or more ⇒ multiple regression The least-squares regression linedescribes this dependency in the datadescribes this dependency in the data.A population regression linepp gdescribes its underlying idealization Involved in describing the probabilities of observing certain values in the sample measurementsvalues in the sample measurements.Brief review of least-squares regression The least-squares regression line makes the sum of squared-prediction errors as smallmakes the sum of squaredprediction errors as small as possible. The slope is and the intercept isPditi db l i i l fPredictions are made by plugging in values of xThe residuals describe the leftover variation inyThe residuals describe the leftover variation in yafter fitting the least-squares regression line The coefficient of determination, r2, measures the proportion of variability in y that is explained by xFormal setup for inference in regression The data arise as n pairs of measurements,(x1y1)(xy)(x1, y1), …, (xn, yn) (xi, yi) are measurements on the ithindividualThestatistical modelisThe statistical modelis yi= β0+ β1xi+ εi μy= β0+ β1xiis the mean response when x = xiy The εiare independent, and each εiis N(0, σ) The least-squares regression line is qg Sample estimate of μy= β0+ β1xiExample: Wages and experienceDo wages rise with experience? In a study of employment trends, wage (y, in $/week) and length of service (LOS = ,g(y,$ ) g (x, in months) measurements were obtained from n = 59 workers in similar customer-service positions.Wages LOS Wages LOS Wages LOS Wages LOS Wages LOS Wages LOS389 94 403 76 443 222 486 60 547 228 443 104395 48 378 48 353 58 393 7 347 27 566 343291023486134941311223284846118432910234861349413112232848461184295 20 488 30 499 153 316 57 327 7 436 156377 60 391 108 322 16 384 78 320 74 321 2547978541614084336036404204221434797854161408433603640420422143315 45 312 10 393 96 369 83 443 24 547 36316 39 418 68 277 98 529 66 261 13 362 603242041754649150270474173041510232420417546491502704741730415102307 65 516 24 272 124 332 97 450 95Example: Wages and experience (continued)S t ti tiSummary statistics: Least-squares regression line and scatterplot:Sampling frameworkIdea: Each value of x defines a subpopulation Multiple, independent SRSs: Each SRS is drawn from a distinct subpopulation y = response variable = measurement of interest x = explanatory variable = subpopulation and sample labels One SRS, with multiple measurements: measure (xi, yi) on the ithindividual(i,yi) … but treat the xias fixed quantities Model describes the conditional distribution of y given its associated subpopulationassociated subpopulationComments on the statistical modelyi= β0+ β1xi+ εiwith independent εi, each N(0, σ)Data=FitResidual+Linearity:μy=β0+β1xconnects subpopulation meansResidualLinearity:μy β0 β1xconnects subpopulation means Constant spread: σ does not depend on x Normality: response measurements are bell-shaped within each subpopulationResiduals and residual standard deviationUnknown population quantities:Th d i blid l d i tiThe random variables εiare residual deviations The parameter σ is the residual standard deviationAnalogous quantities calculated from the sample:Thith(l)id liŷTheith(sample) residualis ei=ŷi–yi The regression standard error isProperties of the slope estimateSuppose (x1, y1), …, (xn, yn) satisfy the assumptions of the statistical model for SLR Mean: Standard deviation: Standard error:Some computational formulas Regression standard error: Standard error for slope:Example: Wages and experience (continued) Regression standard error: Standard error for slope:The t test and CI for slope in SLR Assumptions: The statistical model for SLRHypotheses:H:β=0versusa oneor twosidedHHypotheses:H0: β1= 0 versusa one-or two-sided Ha Test statistic: P-value: P(T ≥ -t) for Ha: β1< 0P(T ≥ t) for Ha: β1> 02P(T ≥ |t|) for Ha: β1≠ 0, where T is t(n –2)CI:For confidence levelC, the interval isCI:For confidence level C, the interval iswhere t* is such that P(T ≥ t*) = (1 – C)/2Example: Wages and experience (continued)Hypotheses: H0: β1= 0 versus Ha: β1> 0Summary statistics: b1 = 0.59, s = 82.2, and SEb1= 0.21Test statistic: t = b1/ SEb1= 0.59 / 0.21 = 2.85P-value: P(T ≥ 2.85) = 0.003, with k = n – 2 = 57 d.f.DiiRj tHti ifi l l005 dDecision:Reject H0at significance level α= 0.05, and conclude that wages rise with experienceExample: Wages and experience (continued)How much do wages rise with experience?95% CI: P(T ≥ 2.00) = 0.025, using k= n–2 = 57 d.f.⇒ t* = 2.00, and the interval isb1± t*SEb1= 0.59 ± (2.00)(0.21)= 0.59 ±0.41 = (0.18, 1.00)C l d i i kl l b t $0 18Conclude an increase in weekly salary between $0.18 and $1.00 per month of service, on averageRobustness A moderate lack of Normality may be tolerated  Better for large ng Outliers or influential observations may be problematicBasic tool:residual plotsBasic tool: residual plotsExample: Wages and experienceand experience (continued)Connections to correlation One SRS, with multiple measurements: (xi, yi) are paired measurements from one SRSp Idea: Treat x as random and work with correlationris the sample correlationris the sample correlation ρ is the population correlationAtestofH0:ρ=0 may be carried out withidenticalA test of H0: ρ 0 may be carried out with identical calculations as a test of H0: β1= 0 … but CI formulas for ρ and β1are very different Different interpretations: Correlation is for two-way relationships; regression is for one-way relationshipsUncertainty in predicted valuesPlugging x into ŷ = b0+ b1x provides a prediction of the response. Two possible interpretations:pp p ŷ is an estimate of the subpopulation meanμ=β+βxμy= β0+ β1xŷis a prediction of an unobserved response, y, from a ŷpp,y,subpopulation with mean μy= β0+ β1xNote: There is more uncertainty in the second interpretation since the target


View Full Document

UVA STAT 2120 - Topic_11

Download Topic_11
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Topic_11 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Topic_11 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?