More on data transformationsIf the primary problem is non-linearity, look at a scatter plot of the data to suggest plausible transformations.Slide 3Slide 4Slide 5Slide 6Slide 7If the variances are unequal and/or error terms are not normal, try a “power transformation” on y.Family of power transformationsIf the variances are unequal, try “stabilizing the variance” by transforming y.If the response y is a Poisson count…If the response y is a binomial proportion...If the response y isn’t anything special…It’s okay to remove some data points to make the transformation work better.It’s better to give up some model fit than to lose clear interpretations.More on data transformations No recipes, but some advice.If the primary problem is non-linearity, look at a scatter plot of the data to suggest plausible transformations.It is possible to use transformations other than ln(x) and ln(y) .xyxYe10Try fittingif the trend in your data follows either of these patterns.0101xY110Try fittingif the trend in your data follows either of these patterns.xy0101xY 10lnTry fittingif the trend in your data follows either of these patterns.xy0101)ln(10xYTry fittingif the trend in your data follows either of these patterns.xy0101)ln(10lnxYTry fittingif the trend in your data follows any of these patterns.xy1101101If the variances are unequal and/or error terms are not normal, try a “power transformation” on y.Family of power transformationsA power transformation on y involves transforming the response by taking it to some power λ. That is:yy Most commonly, for interpretation reasons, λ is a number between -1 and 2, such as -1, -0.5, 0, 0.5, (1), 1.5, and 2.When λ = 0, the transformation is taken to be the natural log transformation. That is: yy lnIf the variances are unequal, try “stabilizing the variance” by transforming y.If the response y is a Poisson count…A common (now archaic?) recommendation is to transform the response using the square root transformation:yyy 21and stay within the linear regression framework. Perhaps, now, the advice should be to use Poisson regression.If the response y is a binomial proportion... A common (now archaic?) recommendation is to transform the response using the arcsine transformation:and stay within the linear regression framework. Perhaps, now, the advice should be to use a form of logistic regression. ppˆsinˆ1If the response y isn’t anything special… A common recommendation is to try the natural log transformation: yy lnOr the reciprocal transformation: yy1It’s okay to remove some data points to make the transformation work better.Just make sure you report the scope of the model.It’s better to give up some model fit than to lose clear interpretations.Just make sure you report that that’s what you
View Full Document