IntroductionComparing Fake and Real DataChecking Probability of 0More Checking ModelsBret LargetDepartments of Botany and of StatisticsUniversity of Wisconsin—MadisonApril 8, 20081 / 24Checking the Probability of 0The Poisson and binomial distributions both are suitable for countdata.In some situations, the proportion of observed zeros can be largerthan the model predicts.One solution is to use a model with overdispersion.An alternative is to use logistic regression to model presense/absenceand then a different model to model the counts when present.Here we will examine the use of simulation to check the goodness offit of real data to data simulated under a Poisson model.We will see if using a model with overdispersion is sufficient when aPoisson model fails.Comparing Fake and Real Data 2 / 24Case Study:The Dusky Sound is a large fjord in southwest New Zealand noted forabundant wildlife.Biologists conducted a study to examine the relationship between theseaweed Ecklonia radiata and the sea urchin Evechinus chloroticus inthe seas of the fjord.The goal is to predict seaweed abundance from urchin abundance andother variables.Other predictors include the distance to the mouth of the fjord (inkm), the distance to the nearest point of land (in km), and the fetch,a sum of distances in km to the nearest land along several radialdirections.Higher fetch values are associated with windier conditions and greaterwaves.There are 103 separate sample locations.The number of seaweed and urchins are counted in a 25 m by 1marea.Comparing Fake and Real Data 3 / 24Data> sw = read.table("seaweed.txt", header = T)> str(sw)'data.frame': 103 obs. of 5 variables:$ seaweed: int 0 0 0 2 2 1 1 0 0 3 ...$ urchin : int 5 9 1 1 0 3 2 5 11 2 ...$ fjord : num 11.5 13 9.7 9.6 7.3 9.1 7 7.5 5.6 10.2 ...$ land : num 0.6 0.7 1 0.7 0.8 1 0.5 1.1 0.8 0.2 ...$ fetch : num 28.1 39.7 39.4 33.5 17.4 29.4 27.5 38.9 33.6 36.2 ...Comparing Fake and Real Data 4 / 24Data Summaries> summary(sw)seaweed urchin fjord landMin. : 0.000 Min. : 0.000 Min. : 1.30 Min. :0.20001st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 7.30 1st Qu.:0.6000Median : 0.000 Median : 2.000 Median : 9.70 Median :0.7000Mean : 4.194 Mean : 2.748 Mean : 9.58 Mean :0.70583rd Qu.: 2.000 3rd Qu.: 4.000 3rd Qu.:12.00 3rd Qu.:0.8000Max. :50.000 Max. :11.000 Max. :18.40 Max. :1.2000fetchMin. :17.401st Qu.:28.50Median :32.30Mean :32.303rd Qu.:36.05Max. :48.30Comparing Fake and Real Data 5 / 24Figure commands> library(lattice)> fig1 = xyplot(seaweed ~ urchin, data = sw, jitter.x = T,+ pch = 16)> fig2 = xyplot(seaweed ~ fjord, data = sw, jitter.x = T,+ pch = 16)> fig3 = xyplot(seaweed ~ land, data = sw, jitter.x = T,+ pch = 16)> fig4 = xyplot(seaweed ~ fetch, data = sw, jitter.x = T,+ pch = 16)> print(fig1, split = c(1, 1, 2, 1))> print(fig2, split = c(2, 1, 2, 1), new = F)> print(fig3, split = c(1, 1, 2, 1))> print(fig4, split = c(2, 1, 2, 1), new = F)Comparing Fake and Real Data 6 / 24Scatter plotsurchinseaweed010203040500 2 4 6 8 10● ●●●●●●● ●●●●●●●●●●●● ●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●● ● ●●● ●● ●●●●●● ● ●●●●●●●●●●fjordseaweed010203040505 10 15● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●● ●●●●●●●●●●●●● ●●●●●●●● ●● ●● ●●●●● ●●●●●●●●●●Comparing Fake and Real Data 7 / 24Scatter plotslandseaweed010203040500.2 0.4 0.6 0.8 1.0 1.2● ● ●● ●●●●●●●●●●● ●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●● ●●● ●●●● ● ●●●●●●●●●●fetchseaweed0102030405020 30 40● ●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ●●● ●●● ●● ●●●● ●●●●●● ●●●●●Comparing Fake and Real Data 8 / 24More Figure Commands> fig5 = histogram(~urchin, data = sw)> fig6 = densityplot(~fjord, data = sw, lwd = 2)> fig7 = densityplot(~land, , data = sw, lwd = 2)> fig8 = densityplot(~fetch, data = sw, lwd = 2)> print(fig5, split = c(1, 1, 2, 1))> print(fig6, split = c(2, 1, 2, 1), new = F)> print(fig7, split = c(1, 1, 2, 1))> print(fig8, split = c(2, 1, 2, 1), new = F)Comparing Fake and Real Data 9 / 24Plotsinitial.weightfat3.03.54.04.55.01000 1200 1400 1600●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●controllowmediumhigh●●●●lactationfat3.03.54.04.55.01 2 3 4 5 6●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●controllowmediumhigh●●●●agefat3.03.54.04.55.020 40 60 80●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●controllowmediumhigh●●●●Comparing Fake and Real Data 10 / 24PlotslandDensity0.00.51.01.52.02.50.0 0.5 1.0●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●fetchDensity0.000.020.040.0610 20 30 40 50●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Comparing Fake and Real Data 11 / 24TransformationsConsider transformations to the urchin variable to decrease the effectof the skewness.As there are zeros, we cannot take logarithms directly.Square root is a
View Full Document