DOC PREVIEW
UW-Madison STAT 333 - 333disc05

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Discussion 5 Stat 333Review1. Prediction Interval (wider than confidence interval):●●●●●●●●●05102 4 6 8incheshoursSnowfall Data2. Use prediction interval to assess possible outlier:(a) Fit a linear model and identify the most “suspect” p oint.(b) Remove that point and re-fit the model.(c) Calculate the prediction interval (with Bonferroni correction) for ˆypredat the x-value of the removedp oint.(d) See if y-value for possible outliers is in or outside of the confidence interval.3. Bonferroni Correction: In testing the outlier, we really are selecting from n possible points, i.e. essentiallyp erforming n tests. There is an issue of multiple comparison. With Bonferroni Correction, we use α∗= α/cwhere c = n. For example, in snowfall data, c=9, α = 0.05, α∗= 0.00555.4. Assumptions: (1)Correct Model. (2) Independence. (3) Homogeneous. (4) Normality.Residual plot is most useful for checking (1) and (3).Normal scores are useful for checking (4).Generally, it is very hard to assess (2) by statistical methods.5. Outlier typ es:- Regression outliers: unusual Y values (standardized residual > 2.5)- High leverage points: unusual X values (consider log transformation on X)6. Possible “fix”: Use transformations (better with scientific knowledge).7. Geometry of linear models (linear space).1Practice ProblemsA recent research topic in the treatment of HIV is to determine whether the genotype of a patient’s HIV virus canb e used to decide on what type of treatment a patient should receive if the patient is failing his/her current therapy.For this purpose, a scoring system called “Genotypic Sensitivity Score (GSS)” has been developed. The data areavailable in HIV.txt on learn UW. The first column represents the GSS and the second column is the patient viralload (VL), that measures the amount of virus in the blood, at a future time point. Fit a simple linear regressionmodel with VL as the outcome and GSS as the predictor.(a) Plot the data. Assess whether or not there is a potential outlier. (Perform and interpret a formal test.)●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●5 10 150 50000 150000 250000Genotypic Sensitivity Score (GSS)viral load (VL)> HIV=read.table('HIV.txt',header=T) # plot(VL~GSS, data=HIV)> out=lm(VL~GSS, data=HIV)> summary(out)Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 13520 17207 0.786 0.436GSS 1691 1972 0.858 0.396Residual standard error: 50140 on 45 degrees of freedomMultiple R-squared: 0.01609, Adjusted R-squared: -0.005778F-statistic: 0.7357 on 1 and 45 DF, p-value: 0.3956> plot(out, 1:2) # residual plot and Q-Q plot15000 25000 350000e+00 2e+05Fitted valuesResiduals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Residuals vs Fitted27166●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●−2 −1 0 1 2−1 0 1 2 3 4 5 6Theoretical QuantilesStandardized residualsNormal Q−Q271662> HIV[27,]GSS VL27 6.6 306251> predict(out, newdata=data.frame(GSS=6.6),interval="prediction", level=(1-0.05/47))fit lwr upr1 24681.01 -152848.5 202210.5> # Remove the 27th observation and re-fit the model.> HIV2=HIV[-27,]> out2=lm(VL~GSS, data=HIV2)> predict(out2, newdata=data.frame(GSS=6.6),interval="prediction", level=(1-0.05/47))fit lwr upr1 17789.73 -77726.71 113306.2(b) If you identified any outliers in the ab ove step, compare the model fit with and without the outlier includedin the data set.> HIV2=HIV[-27,]> out2=lm(VL~GSS, data=HIV2)> summary(out2)Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 2802 9296 0.301 0.7646GSS 2271 1060 2.142 0.0378 *---Signif. codes: 0 ^a˘A¨Y***^a˘A´Z 0.001 ^a˘A¨Y**^a˘A´Z 0.01 ^a˘A¨Y*^a˘A´Z 0.05 ^a˘A¨Y.^a˘A´Z 0.1 ^a˘A¨Y ^a˘A´Z 1Residual standard error: 26930 on 44 degrees of freedomMultiple R-squared: 0.09443, Adjusted R-squared: 0.07384F-statistic: 4.588 on 1 and 44 DF, p-value: 0.03777(c) Does the data (possibly with outliers removed) satisfy the usual regression assumptions? Provide supportingdiagnostic plots.5000 15000 25000 35000−50000 0 50000 100000Fitted valuesResiduals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Residuals vs Fitted16621●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●−2 −1 0 1 2−1 0 1 2 3 4Theoretical QuantilesStandardized residualsNormal Q−Q166213(d) Can you think of a transformation to apply for the linear model assumptions to be satisfied? If yes, reanalyzethe data after transformation.●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●5 10 154 6 8 10 12With all dataGenotypic Sensitivity Score (GSS)log(viral load) (log(VL))8.80 8.90 9.00 9.10−6 −4 −2 0 2 4Fitted valuesResiduals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Residuals vs Fitted19912●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●−2 −1 0 1 2−3 −2 −1 0 1 2Theoretical QuantilesStandardized residualsNormal Q−Q19912●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●5 10 154 6 8 10 12Without 27th obsGenotypic Sensitivity Score (GSS)log(viral load) (log(VL))8.80 8.85 8.90 8.95 9.00−6 −4 −2 0 2 4Fitted valuesResiduals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Residuals vs Fitted19912●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●−2 −1 0 1 2−3 −2 −1 0 1 2Theoretical QuantilesStandardized residualsNormal Q−Q19912> HIV$logVL=log(HIV$VL)> out.log=lm(logVL~GSS, data=HIV)> summary(out.log)Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 9.17162 0.65197 14.068 <2e-16 ***GSS -0.02372 0.07470 -0.317 0.752---Signif. codes: 0


View Full Document

UW-Madison STAT 333 - 333disc05

Download 333disc05
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 333disc05 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 333disc05 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?