DOC PREVIEW
CMU STA 36402-36608 - Homework

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Homework 4: It’s Not the Heat that Gets toYou, It’s the Sustained Conjunction of Heatwith Elevated Levels of Atmospheric Pollutants36-402, Advanced Data AnalysisDue at 10:30 am on Tuesday, 14 February 2012The data set chicago, in the package gamair, contains data on the relation-ship between air pollution and the death rate in Chicago from 1 January 1987 to31 December 2000. The seven variables are: the total number of (non-accidental)deaths each day (death); the median density over the city of large pollutant particles(pm10median); the median density of smaller pollutant particles (pm25median);the median concentration of ozone (O3) in the air (o3median); the median concen-tration of sulfur dioxide (SO2) in the air (so2median); the time in days (time); andthe mean daily temperature (tmpd).We will model how the death rate changes with pollution and temperature. Epi-demiologists tell us that risk factors usually multiply together rather than adding, sowe will fit additive models to the logarithm of the number of deaths. (You may useeither the gam or the mgcv package for fitting additive models.)1. (5 points total) Load the data set and run summary on it.(a) (1) Is temperature given in degrees Fahrenheit or degrees Celsius?(b) (2) The pollution variables are negative at least half the time. What mightthis mean?(c) (2) We will ignore the pm25median variable in the rest of this problemset. Why is this reasonable?2. (10 points) Fit a spline smoothing of log(death) on time. (You can useeither smooth.spline or gam.)(a) (3) Plot the smoothing spline along with the actual values.(b) (4) There should be four large outliers, right next to each other in time.When are they? For full credit, give calendar dates, not day numbers.(Hint: day 0 was 31 December 1993.)(c) (3) How many degrees of freedom did your smoothing spline have? Addcurves to the plot which would result from using 10, 50, 100 and 2000degrees of freedom. (Make sure these differ in color and/or line-style.)What happens to the spline curves as you change the degrees of freedom?13. (15 points) Use gam to fit an additive model for log(death) on pm10median,o3median, so2median, tmpd and time. Use spline smoothing for each ofthese predictor variables.(a) (7) Plot the partial response functions, with partial residuals. Describethe partial response functions in words.(b) (4) Plot the fitted values as a function of time, along with the actual valuesof log(death).(c) (4) Are the outliers still there? Are they any better?4. (15 points) It is medically implausible to supposed that deaths on day t are onlydue to heat or pollution on that day, and not on earlier ones.(a) (8) Suppose that on any given day, we want to know the average valueof some variable over today and the previous k days. Explain how thefollowing code computes that.lag.mean <- function(x, window) {n <- length(x)y <- rep(0,n-window)for (t in 0:window) {y <- y + x[(t+1):(n-window+t)]}return(y/(window+1))}In particular, how is k related to the arguments?(b) (7) Create a new data frame with the same column names as chicago,but where, on each day, the value of the pollution concentrations andtemperature is the average of that day’s value with the previous three days.How many rows should this data frame have? Make sure that the timeand death columns are properly aligned with the new, time-average pre-dictor variables. How can you check that this is working properly?5. (10 points) Fit an additive model, as in problem 3, with the time-averaged pol-lution and temperature variables. (Do not average time or death.)(a) (5) Plot the partial response functions and their partial residuals.(b) (5) Plot the fitted values as a function of time, and the actual values. Whathas happened to the outliers?6. (15 points) Variable examination(a) (4) Find the rows in the data frame (with the time-averaged values) corre-sponding to the large-death outliers. Look at all variables for them, andfor three days on either side. Now compare this to the same stretch oftime a year earlier. Which two variables, aside from death, are unusu-ally high or low around the outliers?2(b) (7) Re-fit the model from problem 5, with an interaction between the twovariables you just picked out. Plot the partial response functions.(c) (4) Plot the fitted values versus time. What has happened to the outliers?7. (20 points) Using the last model you fit, we will consider the predicted impactof a 2◦Celsius increase in temperature on log(death), taking the last fullyear of the data as a baseline.1.(a) (1) Prepare a data frame containing only the last full year of the data.(b) (1) Modify this data frame to increase all temperatures by 2◦C.(c) (3) Find the new predicted values of log(deaths), the old predictedvalues of log(deaths), and the average increase over the year.(d) (5) Find a standard error for this average predicted increase, using the stan-dard errors for the prediction on each day, and assuming no correlationamong them. Also give the corresponding Gaussian 95% confidence in-terval.(e) (5) Find the predicted change in the number of deaths (not change inlog(death) from a 2◦C warming over the course of a whole year. Hint:remember that ex6= ex.(f) (5 points) Explain how you could use bootstrapping to give a 95% con-fidence interval for the average increase in log(death) over the year.More credit will be given for more precise, complete and clear explana-tions.(g) (Extra credit, 5 points) Implement your bootstrapping scheme and givethe confidence interval.8. (10 points) Give, and explain, a reason this estimate of what would happen ifChicago warmed by 2◦C might be systematically flawed. (Do not repeat theproblems mentioned in the footnote. Doubts that such warming will happendo not count.) For full credit, suggest ways of improving the estimates.12◦C is in the middle range of current projections for the global average effect of climate changeby the end of this century (http://www.ipcc.ch/publications_and_data/ar4/wg1/en/contents.html)q. Of course it’s unrealistic to suppose that would be an even shift throughout theyear, or for that matter that Chicago would necessarily warm by the average amount. In fact, some ofthe models (http://www.ipcc.ch/publications_and_data/ar4/wg1/en/ch11s11-5-3.html, Figure 11.11) have 4◦C of warming in the middle of their prediction intervals for central


View Full Document

CMU STA 36402-36608 - Homework

Documents in this Course
Load more
Download Homework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?