Poisson Regression Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison May 1 2007 Statistics 572 Spring 2007 Poisson Regression May 1 2007 1 16 Introduction Poisson Regression Poisson regression is a form of a generalized linear model where the response variable is modeled as having a Poisson distribution The Poisson distribution models random variables with non negative integer values For large means the Poisson distribution is well approximated by the normal distribution In biological applications the Poisson distribution can be useful for variables that are often small integers including zero The Poisson distribution is often used to model rare events Statistics 572 Spring 2007 Poisson Regression May 1 2007 2 16 Introduction Poisson Distribution The Poisson Distribution The Poisson distribution arises in many biological contexts Examples of random variables for which a Poisson distribution might be reasonable include the number the number the number the number time of of of of Statistics 572 Spring 2007 bacterial colonies in a Petri dish trees in an area of land offspring an individual has nucleotide base substitutions in a gene over a period of Poisson Regression Introduction May 1 2007 3 16 Poisson Distribution Probability Mass Function The probability mass function of the Poisson distribution with mean is e k P Y k for k 0 1 2 k The Poisson distribution is discrete like the binomial distribution but has only a single parameter that is both the mean and the variance In R you can compute Poisson probabilities with the function dpois For example if 10 we can find P Y 12 command dpois 12 10 e 10 1012 12 with the 1 0 09478033 Statistics 572 Spring 2007 Poisson Regression May 1 2007 4 16 Introduction Poisson Distribution Poisson approximation to the Binomial One way that the Poisson distribution can arise is as an approximation for the binomial distribution when p is small The approximation is quite good for large enough n If p is small then the binomial probability of exactly k successes is approximately the same as the Poisson probability of k with np Here is an example with p 0 01 and n 10 dbinom 0 4 10 0 01 1 9 043821e 01 9 135172e 02 4 152351e 03 1 118478e 04 1 977108e 06 dpois 0 4 10 0 01 1 9 048374e 01 9 048374e 02 4 524187e 03 1 508062e 04 3 770156e 06 This approximation is most useful when n is large so that the binomial coefficients are very large Statistics 572 Spring 2007 Poisson Regression Introduction May 1 2007 5 16 Poisson Process The Poisson Process The Poisson Process arises naturally under assumptions that are often reasonable For the following think of points as being exact times or locations The assumptions are The chance of two simultaneous points is negligible The expected value of the random number of points in a region is proportional to the size of the region The random number of points in non overlapping regions are independent Under these assumptions the random variable that counts the number of points has a Poisson distribution If the expected rate of points is points per unit length area then the distribution of the number of points in an interval region of size t is t Statistics 572 Spring 2007 Poisson Regression May 1 2007 6 16 Introduction Poisson Process Example Suppose that we assume that at a location a particular species of plant is distributed according to a Poisson process with expected density 0 2 individuals per square meter In a nine square meter quadrat what is the probability of no individuals Solution The number of individuals has a Poisson distribution with mean 9 0 2 1 8 The probability of this is e 1 8 1 8 0 P Y 0 1 8 0 165299 0 In R we can compute this as dpois 0 1 8 1 0 1652989 Statistics 572 Spring 2007 Poisson Regression May 1 2007 7 16 Poisson Regression Poisson Regression Poisson regression is a natural choice when the response variable is a small integer The explanatory variables model the mean of the response variable Since the mean must be positive but the linear combination 0 1 x1 k xk can take on any value we need to use a link function for the parameter The standard link function is the natural logarithm log 0 1 x1 k xk so that exp Statistics 572 Spring 2007 Poisson Regression May 1 2007 8 16 Poisson Regression Example Aberrant Crypt Foci Example Aberrant crypt foci ACF are abnormal collections of tube like structures that are precursors to tumors In an experiment researchers exposed 22 rats to a carcinogen and then counted the number of ACFs in the rat colons There were three treatment groups based on time since first exposure to the carcinogen either 6 12 or 18 weeks The data is in the DAAG data set ACF1 with variables count and endtime library DAAG str ACF1 data frame 22 obs of 2 variables count num 1 3 5 1 2 1 1 3 1 2 endtime num 6 6 6 6 6 6 6 12 12 12 Statistics 572 Spring 2007 Poisson Regression Poisson Regression May 1 2007 9 16 Example 10 Plot of Data 8 4 0 2 attach ACF1 plot count endtime pch 16 count 6 6 8 10 12 14 16 18 endtime Statistics 572 Spring 2007 Poisson Regression May 1 2007 10 16 Example Poisson Regression Linear Predictor acf1 glm glm count endtime family poisson summary acf1 glm Call glm formula count endtime family poisson Deviance Residuals Min 1Q Median 2 46204 0 47851 0 07943 3Q 0 38159 Max 2 26332 Coefficients Estimate Std Error z value Pr z Intercept 0 32152 0 40046 0 803 0 422 endtime 0 11920 0 02642 4 511 6 44e 06 Signif codes 0 0 001 0 01 0 05 0 1 1 Dispersion parameter for poisson family taken to be 1 Null deviance 51 105 Residual deviance 28 369 AIC 92 21 on 21 on 20 degrees of freedom degrees of freedom Number of Fisher Scoring iterations 5 Statistics 572 Spring 2007 Poisson Regression Poisson Regression May 1 2007 11 16 May 1 2007 12 16 Example Quadratic Predictor acf2 glm glm count endtime I endtime 2 family poisson summary acf2 glm Call glm formula count endtime I endtime 2 family poisson Deviance Residuals Min 1Q Median 2 0616 0 7834 0 2808 3Q 0 4510 Max 2 1693 Coefficients Estimate Std Error z value Pr z Intercept 1 722364 1 092494 1 577 0 115 endtime 0 262356 0 199685 1 314 0 189 I endtime 2 0 015137 0 007954 1 903 0 057 Signif codes 0 0 001 0 01 0 05 0 1 1 Dispersion parameter for poisson family taken to be 1 Null deviance 51 105 Residual deviance 24 515 AIC 90 354 on 21 on 19 degrees of freedom degrees of freedom Number of Fisher Scoring iterations 5 Statistics 572 Spring 2007 Poisson Regression Poisson Regression Example 10 Plots of Fitted
View Full Document