Unformatted text preview:

Winter, 2014 Friday, Jan. 31Stat 418 – Day 14Generalized Linear Model (Ch. 3)Last Time:- When have homogenous association for X and Y, can test the statistical significance of that association using the Cochran-Mantel-Haenszel Test after adjusting for Z.o H0: -XY(1) = -XY(2) = … = -XY(K) = 1o Chi-square statistic with 1 degree of freedomo Can first test homogenous association using the Breslow-Day test (SAS)H0: -XY(1) = -XY(2) = … = -XY(K) Chi-square statistic with K – 1 degrees of freedomo But CMH should be reasonable as long as don’t differ drasticallyo Can be generalized to I × J × K tables- If the association is not homogenous, then you consider the “interaction” – the change in the (X, Y) odds ratio at different values of ZReview: We have seen this idea of “adjusting for other variables” in multiple regression.Model: Y = β0 + β1 x1 + β2 x2(a) Adjustments to formula:(b) Interpretation of β1:(c) Assumptions about Y:(d) How would we fit “nonparallel lines”?(e) How would we fit a quadratic or exponential relationship?1Winter, 2014 Friday, Jan. 31All generalized linear models (GLM) have three components:1) The random component which identifies the response variable Y and assumes a probability distribution for it.2) The systematic components which specifies the explanatory variables for the model.3) The link function specifies a function of the expected value of Y, which the GLM relates to the explanatory variables through a prediction equation having linear forms.(f) Identify these components for multiple regression.(g) What are the main ways a GLM differs from this?Example 1: Recall the Donner party data. One conjecture was whether the men in the party tended to be older than the females. We would like to estimate the association between survival and gender, after adjusting for age, as well as predict the survival probability based on different characteristics.(a) Suppose we want to fit a regression model of survival vs. age. Can we do this? What if we code survival as (0 = died, 1 = survived)?(b) Open donner.jmp which now has a “SurvivalCode” column. Choose Analyze > Fit Y by X and use SurvivalCode as the response and age as the X, Factor. Press OK. From the Bivarate pull-down menu, choose Fit Line. Does this regression line do a reasonable job of modeling these data? What are some limitations of the regression line here?(c) Now choose Analyze > Distribution, entering the Survived variable in the Y, Columns box and the age group variable in the By box. (Probably want to stack the output.) Do you see a pattern in how the proportion of non-survivors changes across the age groups? How might we model such a pattern?(d) Explain how a graph of these proportions vs. the age groups would behave.2Winter, 2014 Friday, Jan. 31A logistic regression function models the probability of success as a function of x through the equation:xxeex1)( for -- < x < -(e) What happens to this function as x --? What happens to this function as x ---? (f) Consider this function for different values of β, what is the effect of increasing β for a fixed value of -? You can use PlotLogistic.xls to graph up to three functions with different - values (- = 1, 2, 6, with - = -15)Then consider this function for different values of -, what is the effect of increasing - for a fixed value of β? Use PlotLogistic.xls to graph up to three functions with different - values (- = -8, -10, -12, with - = 1). What if you change the x-values to range from -20 to 0 with alpha values of 5, 10, and 15?(g) What will this function look like if β is negative? If β equals zero?3Winter, 2014 Friday, Jan. 31We have found that it is often better to work with the odds than the conditional probability. (h) If the probability if (x), what is odds of x?(i) Using this equation xxeex1)(, write out an equivalent expression for the log-odds at x. [Hint: Start with what you just told me about odds.](j) How does this model fit the three requirements of a Generalized Linear Model?(k) Use the expression you found and provide an interpretation of the coefficient β in the logistic model. [Hint: Think in terms of odds instead of log-odds. Think about x vs. x + 1.](l) In the JMP data window, select Analyze > Fit Y by X, and enter the survived (categorical variable) asthe Y, Response and age as the X, Factor. - Use the output in the Parameter Estimates to write out the logistic regression equation.- Doesˆ have the expected sign?- Is the coefficient statistically significant?Practice Problem: Repeat (l) but this time also add gender to the By box. Compare the output for males and females. (a) Does the association between survival and age appear to be the same for males and females (homogenous association)? [Hint: From each Logistic Fit pull down menu, select Odds Ratio.] (b) Are both relationships significant?4Winter, 2014 Friday, Jan.


View Full Document

Cal Poly STAT 418 - Generalized Linear Model

Download Generalized Linear Model
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Generalized Linear Model and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Generalized Linear Model 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?