Lecture 12 Model BuildingThe real world regressionThe “final model”Principle of ParsimonyGeneral Process of Model BuildingExploratory Data AnalysisSlide 7SENICSENIC exampleWork through the exploration…Next step: Pick an initial modelSlide 12Slide 13Check model assumptionsDoes it fit?Last stepSlide 17Other model building issues: Stepwise approachesIs stepwise ever a good idea?Stepwise ApproachesOther model building issues: R2Other model building issues: Information CriteriaA step furtherNext: Diagnostics in MLRLecture 12Model BuildingBMTRY 701Biostatistical Methods IIThe real world regression datasets will have a large number of covariates!There will be a number of covariates to consider for inclusion in the modelThe inclusion/exclusion of covariates•will not always be obvious•will be affected by multicollinearity•will depend on the questions of interest•will depend on the scientific ‘precedents’ in that areaThe model building process is important for determining a “final model”The “final model”At the end of the analytic process, there is generally one model from which you make inferencesit usually is a multiple regression modelit is not logical to make inferences based on more than one modelRecall the ‘principle of parsimony’Principle of ParsimonyAlso known as Occam’s RazorThe principle states that the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory. The principle recommends selecting the hypothesis that introduces the fewest assumptions and postulates the fewest entities Translation for regression: •the fewest possible covariates that explains the greatest variance is best!•The addition of each covariate should be weighed against the increase in complexity of the model.General Process of Model Building1. Exploratory Data analysis2. Choose initial model3. Fit model4. Check model assumptions5. Repeat 2 – 4 as needed6. Interpret findingsExploratory Data AnalysisConsider the covariates and the outcome variables•look at each covariate and outcomewhat forms do they take?might transformations need to be made?•look at relationships between Y and each Xare the relationships linear?what form should a covariate take to enter the model (e.g. categorical? spline? quadratic?)•look at the relatioships between the X’sis there strong correlation between some covariates?Exploratory Data AnalysisIndividual variable analysis•histograms•boxplots•dotplots (by categories?)Two-way associations•scatterplots•color-coded by third variable?•SIMPLE LINEAR REGRESSIONSFor categorical variables•tables•color code other graphical displaysSENICSENIC exampleWe need a scientific question/hypothesis!!Examples:•What factors are predictive of length of stay?•Is the number of beds strongly related to length of stay?•Is there a difference in length of stay by region?•how do infection risk and number of cultures relate to length of stay? is it possible to reduce the length of stay by reducing infection risk and number of cultures?Work through the exploration…Next step: Pick an initial modelUse the information that you learned in the exploratory stepSome guidelines•covariates not associated in SLR models will probably not be associated in MLR models•Choose threshold: alpha < 0.10 or 0.20 in SLR to be included in initial MLRRecall multicollinearity•might want to spend some extra time learning about the interrelationships between two variables and the outcome.Next step: Pick an initial modelMany approaches to the initial modelMy approach: start big, and then pare down•initial model includes all of the covariates and potentially their interactions•fit model with all of the covariates of interest•remove ONE AT A TIME based on insignificant p-values and model coefficientsfind the most insignificant covariaterefit the model without itlook at model: •what happened to other coefficients?•what happened to R2•not hard-fast rules!SENICWhat is an appropriate initial model?Are there any interactions to consider?Work through the model…Check model assumptionsBased on a reasonable model (in terms of ‘significance’ of covariates), check the assumptionsResidual plotsOther diagnosticsRecall your assumptions:•independence of errors•homoscedasticity/constant variance•normality of errorsDoes it fit?If so….go to next stepIf not, deal with misspecifications•transform Y?•another type of regression?!•transform X?•consider more exploration (e.g., smoothers to inform about relationships)•outlier problems?•Then, refit all over again…Last stepInterpret resultsOddly, this step often leads you back to refittingSometimes trying to summarize results causes you to think of additional modeling considerations•adding another variable•using a different parameterization•using a different reference level for a categorical variableSENICWhat is the final model?How to present it?Other model building issues: Stepwise approaches“Stepwise” approaches are computer drivenyou give the computer a set of covariates and it finds an ‘optimal’ model“forward” and “backward”Problems:•models are only ‘stepwise’ optimal•ignore magnitude of β and simply focus on p-value!•you need to set criteria for optimality which are not always obvious•gives you no ability to give different variables different priorities•can have problematic interpretations: e.g. a main effect is removed, but the interaction is included.•stepwise forward and backward give different models.Is stepwise ever a good idea?If you have a very large set of predictors that are somewhat ‘interchangeable’Example: gene expression microarrays•you may have >10000 genes to select from•automated procedures can find optimal set that describe a large amount of variation in the outcome of interest (e.g. cancer vs. no cancer)•it would be physically impossible to use manual model-fitting•Specialized software for this (standard ‘lm’ type approach will not work).Stepwise ApproachesI don’t condone it but,In R: step(reg)Other model building issues: R2Some people use increase in as a criteria of inclusion/exclusion of a covariateNot that common in
View Full Document