DOC PREVIEW
PSU STAT 544 - Poisson Regression

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 544, Lecture 15 1'&$%PoissonRegressionLast time we began to discuss Poisson regression,which assumesyi∼ Poisson(μi),i=1,...,N.The canonical link is the log,log μi= η = xTiβ.A Poisson model with a log link is commonly called aloglinear model. Under the Poisson model, thevariance function is V (μ)=μ. When overdispersion isevident, it is usually handled by a scale parameter,Var(yi)=φμi, which makes it a quasilikelihoodmodel. To fit this model, we iteratively solveβ =“XTWX”−1XTWz,where W = Diag(μi) is the matrix of weights andzi= ηi+(yi− μi)/μi.Stat 544, Lecture 15 2'&$%The estimated covariance matrix forˆβ is (XTWX)−1,the Pearson residuals areri=yi− μi√μi,Pearson goodness-of-fit statistic is X2=PNi=1r2i,and the deviance isG2=2Xi:yi=0jyilogyiμi− (yi− μi)ff+2Xi:yi=0μi.These fit statistics can be reliable compared to χ2N−pif no more than 20% of the estimated μi’s are lessthan 5.0 and none are less than 1.0.Computations. Below is an R function that fitsPoisson regression with a log link, and a function forprinting out the results. These are simple revisions ofthe functions that we used for logistic regression.Starting values are obtained from an OLS regressionof log(yi+1/2) on xi.#######################################################################poisson.regression <- function(x,y,maxits=20,eps=1e-10,beta.start){# Fit a Poisson model with a log linkif( missing(beta.start) ){Stat 544, Lecture 15 3'&$%ystar <- log(y+.5)newbeta <- lsfit(x,ystar,intercept=F)$coef}else{newbeta <- beta.start}if(is.vector(x)) x <- matrix(x,ncol=1)iter <- 0converged <- Fwhile( (!converged) & (iter<maxits) ){iter <- iter+1cat(iter); cat("...")beta <- newbetaeta <- as.vector(x%*%beta)mu <- exp(eta)z <- eta + (y - mu)/muweights <- mutmp <- lsfit(x,z,wt=weights,intercept=F)newbeta <- tmp$coefconverged <- all(abs(newbeta-beta)<eps)}cat("\n")eta <- as.vector(x%*%beta)mu <- exp(eta)loglik <- sum( y*log(mu) - mu)xtwxinv <- ls.diag(tmp)$cov.unscaled# compute X2, Pearson residuals, influencer <- (y-mu)/sqrt(mu)X2 <- sum(r^2)xstar <- diag(sqrt(mu))%*%xhatstar <- xstar%*%solve(t(xstar)%*%xstar)%*%t(xstar)leverage <- diag(hatstar)# compute deviance contributions and G2deviance <- y*log(y/mu)-(y-mu)if(any(y==0))deviance[y==0] <- mu[y==0]G2 <- 2*sum(deviance)result <- list(beta=newbeta, cov.beta=xtwxinv, iter=iter,converged=converged, loglik=loglik, residuals=r,X2=X2, G2=G2, df=nrow(x)-ncol(x), leverage=leverage, fitted=mu,eta=eta, z=z)result}#######################################################################poisson.print <- function(result, scale=1.0){if(result$converged){cat(paste("The Newton-Raphson algorithm converged in",format(result$iter),"iterations."))}else{Stat 544, Lecture 15 4'&$%cat(paste("Algorithm failed to converge by",format(result$iter),"iterations"))}cat("\n\n")coef <- result$betaSE <- sqrt(scale*diag(result$cov.beta))tstat <- coef/SEpval <- 2*pnorm(-abs(tstat))coef.table <- cbind(coef,SE,round(tstat,2),round(pval,3))dimnames(coef.table) <- list(names(result$beta),c("coef","SE","coef/SE","pval"))print(coef.table)cat("\n")cat(paste("Loglikelihood =",result$loglik))cat("\n")cat(paste("Pearson’s X^2 =",result$X2))cat("\n")cat(paste(" Deviance G^2 =",result$G2))cat("\n")cat(paste(" df =",format(result$df)))cat("\n")if(scale!=1.0){cat(paste(" Scaled X^2 =",result$X2/scale))cat("\n")cat(paste(" Scaled G^2 =",result$G2/scale))cat("\n")}tmp <- sum(result$fitted<5)pct <- round(100*tmp/length(y),2)cat(paste(format(pct),"% of observations have expected counts below 5.0",sep=""))cat("\n")tmp <- min(result$fitted)cat(paste("The minimum expected cell count is ",format(tmp),sep=""))cat("\n")invisible()}#######################################################################Stat 544, Lecture 15 5'&$%Example: The data below come from the famous19th century study by Vladislav Bortkiewicz offatalities in the Prussian army due to horse kicks.The table shows the total number of deaths acrossfourteen army corps from 1875 to 1894:Year 75 76 77 78 79 80 81 82 83 84Deaths 3 5 7 9 10 18 6 14 11 9Year 85 86 87 88 89 90 91 92 93 94Deaths 5 11 15 6 11 17 12 15 8 4This example appears in many textbooks to illustratethe Poisson distribution. Sometimes four of the corpsare removed. Sometimes the data are disaggregatedby corps, and a test is then done to see if the countsfrom the 200 corps-years follow a simple Poissondistribution. For illustration, we will examine thedata by year to look for a trend over time.First we enter the data and plot the log(yi) versusyear:> year <- 75:94> y <- c(3,5,7,9,10,18,6,14,11,9,5,11,15,6,11,17,12,15,8,4)> plot(year,log(y))Stat 544, Lecture 15 6'&$%••••••••••••••••••••Apply loess() to look for trends:> tmp <- loess(log(y) ~ year)> plot(year, log(y))> lines( year, predict(tmp) )••••••••••••••••••••If we are going to believe that there is a trend overtime, we may need a fourth-degree polynomial tocapture it. Let’s see how the null model fits:> x <- rep(1,length(y))> result <- poisson.regression(x,y)Stat 544, Lecture 15 7'&$%1...2...3...4...> poisson.print(result)The Newton-Raphson algorithm converged in 4 iterations.coef SE coef/SE pvalX1 2.282382 0.07142857 31.95 0Loglikelihood = 251.346947592599Pearson’s X^2 = 37.4693877550946Deviance G^2 = 38.502814220699df=190% of observations have expected counts below 5.0The minimum expected cell count is 9.8> 1-pchisq(result$X2,result$df)[1] 0.00692793The model doesn’t fit. Possible reasons are1. the time trend is real,2. other omitted covariates (e.g. indicators for armycorps), or3. overdispersion.With our present information, we cannot distinguish(2) from (3). But we can look for a time trend.Let’s include effects for time, up to the fourth power.If we simply include year, year2,year3and year4inthe model, these terms will be highly collinear,possibly causing numerical problems. An easy way tosolve that problem is to (at least approximately)recenter the year at zero before raising it to higherStat 544, Lecture 15 8'&$%powers; the overall fit will be the same, but the termswill then be (at least approximately) uncorrelated.> time <- year-85 # center it at 1885 to reduce collinearity> x <- cbind(int=1, t=time, t2=time^2, t3=time^3, t4=time^4)> result <- poisson.regression(x,y)1...2...3...4...5...> poisson.print(result)The Newton-Raphson algorithm converged in 5 iterations.coef SE coef/SE pvalint 2.2195221444


View Full Document

PSU STAT 544 - Poisson Regression

Download Poisson Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Poisson Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Poisson Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?