Checking the Probability of 0 The Poisson and binomial distributions both are suitable for count data More Checking Models In some situations the proportion of observed zeros can be larger than the model predicts Bret Larget One solution is to use a model with overdispersion Departments of Botany and of Statistics University of Wisconsin Madison An alternative is to use logistic regression to model presense absence and then a different model to model the counts when present April 8 2008 Here we will examine the use of simulation to check the goodness of fit of real data to data simulated under a Poisson model We will see if using a model with overdispersion is sufficient when a Poisson model fails Comparing Fake and Real Data 1 24 Case Study Data The Dusky Sound is a large fjord in southwest New Zealand noted for abundant wildlife Biologists conducted a study to examine the relationship between the seaweed Ecklonia radiata and the sea urchin Evechinus chloroticus in the seas of the fjord The goal is to predict seaweed abundance from urchin abundance and other variables Other predictors include the distance to the mouth of the fjord in km the distance to the nearest point of land in km and the fetch a sum of distances in km to the nearest land along several radial directions Higher fetch values are associated with windier conditions and greater waves There are 103 separate sample locations The number of seaweed and urchins are counted in a 25 m by 1m area Comparing Fake and Real Data 2 24 3 24 sw read table seaweed txt header T str sw data frame seaweed int urchin int fjord num land num fetch num 103 obs of 5 variables 0 0 0 2 2 1 1 0 0 3 5 9 1 1 0 3 2 5 11 2 11 5 13 9 7 9 6 7 3 9 1 7 7 5 5 6 10 2 0 6 0 7 1 0 7 0 8 1 0 5 1 1 0 8 0 2 28 1 39 7 39 4 33 5 17 4 29 4 27 5 38 9 33 6 36 2 Comparing Fake and Real Data 4 24 Data Summaries Figure commands summary sw seaweed Min 0 000 1st Qu 0 000 Median 0 000 Mean 4 194 3rd Qu 2 000 Max 50 000 fetch Min 17 40 1st Qu 28 50 Median 32 30 Mean 32 30 3rd Qu 36 05 Max 48 30 urchin Min 0 000 1st Qu 1 000 Median 2 000 Mean 2 748 3rd Qu 4 000 Max 11 000 fjord Min 1 30 1st Qu 7 30 Median 9 70 Mean 9 58 3rd Qu 12 00 Max 18 40 land Min 0 2000 1st Qu 0 6000 Median 0 7000 Mean 0 7058 3rd Qu 0 8000 Max 1 2000 Comparing Fake and Real Data 5 24 Scatter plots 30 40 30 10 0 0 20 2 4 6 urchin Comparing Fake and Real Data 8 10 0 20 10 0 15 0 2 fjord 0 4 20 0 6 0 8 Comparing Fake and Real Data land 7 24 10 30 10 5 30 40 40 10 seaweed seaweed 50 20 50 seaweed 50 6 24 Scatter plots 40 Comparing Fake and Real Data seaweed 50 library lattice fig1 xyplot seaweed urchin data sw jitter x T pch 16 fig2 xyplot seaweed fjord data sw jitter x T pch 16 fig3 xyplot seaweed land data sw jitter x T pch 16 fig4 xyplot seaweed fetch data sw jitter x T pch 16 print fig1 split c 1 1 2 1 print fig2 split c 2 1 2 1 new F print fig3 split c 1 1 2 1 print fig4 split c 2 1 2 1 new F 1 0 0 1 2 20 30 40 fetch 8 24 More Figure Commands Plots control low medium high 5 0 fig5 histogram urchin data sw fig6 densityplot fjord data sw lwd 2 fig7 densityplot land data sw lwd 2 fig8 densityplot fetch data sw lwd 2 print fig5 split c 1 1 2 1 print fig6 split c 2 1 2 1 new F print fig7 split c 1 1 2 1 print fig8 split c 2 1 2 1 new F medium high 5 0 4 0 3 5 3 0 3 5 4 0 3 0 1000 1200 1400 1600 initial weight 3 5 3 0 1 2 3 4 5 6 20 40 lactation 60 80 age Comparing Fake and Real Data 9 24 Plots Comparing Fake and Real Data 4 5 fat fat medium high 5 0 4 5 4 0 control low 4 5 fat control low 10 24 Transformations Consider transformations to the urchin variable to decrease the effect of the skewness 2 5 0 06 2 0 As there are zeros we cannot take logarithms directly Density Density Square root is a possibility 1 5 1 0 0 04 We can also try adding one and then taking logs See the graphs and that both are more symmetric We will use the log 1 x transformation to follow the authors 0 02 0 5 0 0 0 0 0 5 1 0 land Comparing Fake and Real Data 0 00 10 20 30 40 fig9 densityplot sqrt urchin data sw lwd 2 fig10 densityplot log 1 urchin data sw lwd 2 print fig9 split c 1 1 2 1 print fig10 split c 2 1 2 1 new F 50 fetch 11 24 Comparing Fake and Real Data 12 24 Plots Fitting a Poisson Regression 0 5 fit1 glm glm seaweed log 1 urchin fjord land fetch data sw family poisson display fit1 glm 0 5 0 4 glm formula seaweed log 1 urchin fjord land fetch family poisson data sw coef est coef se Intercept 1 89 0 46 log 1 urchin 1 91 0 09 fjord 0 35 0 02 land 0 75 0 30 fetch 0 01 0 01 n 103 k 5 residual deviance 491 8 null deviance 1407 1 difference 915 3 0 4 Density Density 0 3 0 2 0 3 0 2 0 1 0 1 0 0 1 0 1 2 3 0 0 4 0 sqrt urchin 1 2 3 log 1 urchin Comparing Fake and Real Data 13 24 Residual Plot Comparing Fake and Real Data 14 24 Checking 0 probability print xyplot residuals fit1 glm fitted fit1 glm pch 16 residuals fit1 glm Biological count data often has an excess of zeros relative to what a standard model predicts 10 We can examine this by simulation and compare the number of zeros in simulated data to the number in the real data 5 0 We can do this by generating Poisson random variables with means as the fitted values 5 0 10 20 30 40 fitted fit1 glm Comparing Fake and Real Data 15 24 Comparing Fake and Real Data Checking Probability of 0 16 24 Checking 0 probability Checking 0 probability 10 count0 function x return sum x 0 with sw count0 seaweed Percent of Total 8 …
View Full Document