Stanford STATS 203 - Lecture Notes - D93047

Home> Schools> Stanford University> Statistics (STATS) > STATS 203> Lecture Notes

DOC PREVIEW

Stanford STATS 203 - Lecture Notes

School name Stanford University

Course Stats 203- Introduction to Regression Models and Analysis of Variance

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Statistics 203Introduction to Regression and Analysis ofVarianceTake Home FinalDue Friday, March 18Prof. J. TaylorQ. 1) The director of admissions of a small college adminstered a newly designedentrance test to 20 students selected at random from the new fres hmanclass in a study to determine whether a student’s grade point average(GPA) at the end of the freshman year (Y ) can be predicted from theentrance test score (X). The results of the study can be found athttp://www-stat.stanford.edu/˜jtaylo/courses/stats203/data/GPA.table(a) Fit a simple linear regression to the model.(b) Plot the es timated regression function along with the data in R. Doesthe line appear to fit well?(c) Fit a robust regression model to the data? Does this improve the fit?Plot the fitted values on the same plot.(d) Subsequently, a recording error shows that the 6th test score shouldread 6.2 rather than 3.2. Refit the OLS model. Compare the esti-mated coefficients and standard errors in the robust and “corrected”OLS model. Plot the new fitted values on this plot as well.(e) Obtain an approximate confidence interval for the mean freshmanGPA of students with entrance test score X = 5.0. Which model doyou prefer to use? Why?Q. 2) A psychologist conducted a study to examine the nature of the relation,if any, be tween an employee’s emotional s tability X and the employee’sability to perform in a task group Y . Emotional stability was measured bya written test for which the higher the score, the greater is the emotionalstability. Ability to perform in a task group (Y = 1 if able, Y = 0 ifunable) was evaluated by the supervisor. The results can be found inhttp://www-stat.stanford.edu/˜jtaylo/courses/stats203/data/stability.table1(a) Fit a logistic regression model to the data, state the fitted responsefunction.(b) Obtain exp(bβ1) and interpret this number (β1corresponds to stabil-ity).(c) What is the estimated probability that employees with an emotionalstability test score of 550 will be able to perform in a task group?(d) Estimate the emotional stability test score for which 70 % of theemployees with this test score are expected to be able to perform ina task group.(e) Plot the standard diagnostics of the model – do any observationsseem to be highly influenftial? outliers? What about the qqplot?(f) Obtain a 95% confidence interval for exp(β1).(g) Obtain joint confidence intervals for the mean response πhfor personswith emotional stability test scores Xh= 550 and 625, respectively,with an approximate 90% (joint) confidence coefficient. Interpretyour intervals.Q. 3) The carbonation level of a soft drink beverage is affected by the tempera-ture of the product and the filler operating pressure. Twelve observationswere obtained and the resulting data are shown below. The data can befound athttp://www-stat.stanford.edu/˜jtaylo/courses/stats203/data/carbonation.table(a) Fit a second order polynomial regression model to the data, includingquadratic effects for each main effect as well as an interaction.(b) Compare the full model to one with just an intercept at level α =0.05.(c) Does the interaction term contribute significantly to the model?(d) What about the quadratic terms?Q. 4) This question studies the Sattherwaite approximation for the distributionof a weighted sum of χ2random variables. Suppose we are in a two-samplesetting:Yij= µi+ εij, 1 ≤ i ≤ 2, 1 ≤ j ≤ niwhere the errors εij∼ N(0, σ2i) are independent but may not have equalvariance.(a) Consider a T-te st to test whether µ1= µ2. What appears in thenumerator?(b) What is the variance of the numerator? Propose an unbiased estimateof this variance, to be used in the denominator.(c) Use Sattherwaite’s approximation, which we covered in2http://www-stat.stanford.edu/ jtaylo/courses/stats203/notes/fixed+random.pdfto approximate the distribution of the denominator.(d) Compute the T -statistic to test whether µ1= µ2using the data inhttp://www-stat.stanford.edu/˜jtaylo/courses/stats203/data/ttest.tablewhere group is the grouping variable.(e) Verify that your results agree with t.test when using the var.equal=Foption.Q. 5) This question studies binomial regression, a generalization of binary re-gression. T he datahttp://www-stat.stanford.edu/˜jtaylo/courses/stats203/data/florida2000.tablecontains a subset of the Florida election results of 2000. They were takenfrom a subset of the entire national county by county data that can befound athttp://wand.stanford.edu/elections/us/fl/The columns have the following meaning• buchanan – the numb er of votes for Pat Buchanan by county;• total – the total numbe r of votes cast by county ;• prep96 – the number of votes for the Republican candidate in 1996by county;• pperot96 – the number of votes for Ross Perot candidate in 1996 bycounty;• demographics – a summary of demographic information from the1999 census by county.We will logistic build a model for the number of votes for Pat Buchanan,which was the subject of much debate due to the “butterfly ballot” (seepaper in the above website).(a) Given that Nivotes were cast in the i-th county, propose an additivelogistic model for the distribution ofYi= Bi/Ni,the proportion of votes cast for Pat Buchanan in this county. In-clude variables for the Perot and Republican votes as well as thedemographic variables.(b) What is the link function of the model?(c) What is the variance function of the model, is it the same as a binaryregression model? How is it different?3(d) Using glm, fit the above model. (Hint: you may need to use theweights argument).(e) Plot the standard diagnostics. Are there any particularly noteworthyresiduals, in terms of influence and absolute value?Q. 6) Consider the Latin square ANOVA model discussed in the class on exper-imental design with r treatments and two blocking variables each with rcategories:Yijk= µ...+ ρi+ κj+ τk+ εijk1 ≤ i, j, k ≤ r.Remember, there are only r2observations. That is, there is only oneobservation for each pair (i, j) of blocking variables. The ANOVA tablecan be found in the noteshttp://www-stat.stanford.edu/˜jtaylo/courses/stats203/notes/design.pdfOf interest is to determine whether the treatme nt has any main effect, i.e.whetherH0: τ1= · · · = τr= 0is true or not.(a) Compute the power of the appropriate F -test to te st H0, as a functionofφ =rPrk=1τ2kσ2which, up to a factor of σ2is the non-centrality parameter in SSTR(b) Write an R function to compute the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

Stanford STATS 203 - Lecture Notes

Sign up for free to view:

Please select your school