Unformatted text preview:

STOR 557 Fall 2023 Midterm Two October 31 2023 Open book in class exam time limit 75 minutes You are allowed to consult course notes and text printed or e read homework assignments and any personal notes you have made during the course Other outside materials are not permitted Computers or ipads may be used only for the purpose of accessing pre stored course notes they are not to be used for computations during the exam A hand held calculator is permitted Answers should preferably be written in a university examination book blue book You may consult the instructor if the wording is unclear or if you think there might be an error but the instructor will not give hints how to solve the exam The university Honor Code is in e ect at all times The whole exam is worth 100 points 60 points for question 1 40 for question 2 Statistical tables are provided 1 Alcohol consumption is widely believed to be a causative factor in oesophageal cancer A study of 975 hospital patients in France classi ed the patients according to whether or not they had oesophageal cancer their alcohol consumption high or low and their age group The raw data are as follows Age Group High Alc Low Alc High Alc Low Alc Cancer No Cancer 25 34 35 44 45 54 55 64 65 74 75 Total 1 4 25 42 19 5 96 0 5 21 34 36 8 104 9 26 29 27 18 0 109 106 164 138 139 88 31 666 Note that the last row of the data is derived by simply adding up the numbers in the rst six rows and therefore represents the combined table of oesophageal cancer against alcohol consumption without taking account of age a Based on the last row of the table the Pearson continuity corrected chi square statistic for the null hypothesis that alcohol consumption and incidence of oesophageal cancer are independent is 108 22 Is this a statistically signi cant result and what do you conclude 7 points b The same calculation was repeated separately for each of the rst six rows of the above table with the resulting chi square statistics of 2 19 4 18 24 1 37 0 5 36 and 9 9 State which of these values is statistically signi cant and summarize your conclusion in words 7 points c Are the results in a and b an example of Simpson s paradox What are the advantages and disadvantages of each approach 6 points d The data in the above table were combined into a dataframe OC with a numerical variable Count and factor variables Cancer Alcohol and Agegp The following code was run 1 ct3 xtabs Count Cancer Alcohol Agegp OC apply ct3 3 function x x 1 1 x 2 2 x 1 2 x 2 1 mantelhaen test ct3 exact T The results were as follows 1 5 Inf 5 046154 5 665025 6 359477 2 580247 3 2 4 6 Inf Exact conditional test of independence in 2 x 2 x k tables data ct3 S 666 p value 2 2e 16 alternative hypothesis true common odds ratio is not equal to 1 95 percent confidence interval 3 572140 7 758317 sample estimates common odds ratio 5 250951 Explain what this code is doing and how you interpret the result Why are two of the numbers in the second row stated as Inf How does the common odds ratio of 5 250951 relate to the six numbers including the two labelled Inf in the second row 15 points e A series of Poisson regression models was tted to the data as follows g1 glm Count Alcohol Cancer Agegp family poisson OC g2 glm Count Alcohol Cancer Agegp family poisson OC g3 glm Count Alcohol Agegp Agegp Cancer family poisson OC g4 glm Count Alcohol Cancer Agegp 2 family poisson OC g5 glm Count Alcohol Cancer Agegp family poisson OC A summary of the results is given as follows Model Residual Deviance Residual DF g1 g2 g3 g4 g5 224 2 145 9 90 56 11 04 0 16 15 6 5 0 Which do you conclude is the best of the ve models Give relevant calculations to support your conclusion 15 points f State in words what you conclude about the relationships among age alcohol consump tion and incidence of oesophageal cancer 10 points QUESTION TWO ON THE NEXT PAGE 2 2 Twelve hospital patients were given an experimental dietary regime After each week their levels of plasma ascorbic acid were measured The results are illustrated graphically in Figure 1 Each broken line curve represents one patient It can be seen that most patients demon strated a rise followed by a fall in their levels of ascorbic acid but visually there appears to be signi cant variability among patients Figure 1 Weekly Variations in Plasma Ascorbic Acid for 12 Patients a The data was collected in a dataframe Diet with variables Time factor Person factor and Amount numeric A simple analysis of variance through the R function a1 aov Amount Time Person Diet produced the following output summary a1 Df Sum Sq Mean Sq F value Pr F 6 6 165 1 0274 11 3 682 0 3347 66 4 449 0 0674 Time Person Residuals Signif codes 0 0 001 0 01 0 05 0 1 1 15 241 7 25e 11 4 966 1 42e 05 Brie y interpret the results of this analysis What is its main limitation 8 points For the next three parts I want you to write brief R code for each of the following operations b Fit the same model as a mixed model with Time treated as a xed e ect and Person c Within this model write R code for two ways to test the signi cance of Time 10 as a random e ect 8 points points 3 d Within this model write R code for one way to test the signi cance of the random e ect of Person 8 points e When the test of part d is applied to the model that includes both Time as a xed e ect and Person as a random e ect the p value appears as 2 2e 16 which is 0 for all practical purposes However when the same test is applied to the model from which Time has been omitted so that the random e ect due to Person is the only non constant term in the model the p value is about 0 017 How would you interpret the discrepancy between these two results 6 points 4 Sketch Solutions Note There is no single right answer to questions like these Credit will be given for answers that pick up key points in the analysis that may not necessarily agree with those here 1 a The distribution of the Pearson statistic if the null hypothesis of no association is true is asymptotically 2 1 but the value 108 22 is way beyond the percentage points for that distribution you didn t have the means to calculate this exactly in the exam but you can …


View Full Document

UNC-Chapel Hill STOR 557 - Midterm Two

Download Midterm Two
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Midterm Two and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Midterm Two and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?