Chapter 12Multiple RegressionExample - Effect of Birth weight on Body Size in Early AdolescenceLeast Squares EstimationAnalysis of VarianceTesting for the Overall Model - F-testSlide 7Testing Individual Partial Coefficients - t-testsSlide 9Comparing Regression ModelsSlide 11Models with Dummy VariablesExample - Deep Cervical InfectionsExample - Weather and Spinal PatientsModeling InteractionsLogistic RegressionLogistic Regression with 1 PredictorSlide 18Example - Rizatriptan for MigraineExample - Rizatriptan for Migraine (SPSS)Odds Ratio95% Confidence Interval for Odds RatioExample - Rizatriptan for MigraineMultiple Logistic RegressionExample - ED in Older Dutch MenExample - ED in Older Dutch MenChapter 12Multiple RegressionMultiple Regression•Numeric Response variable (y)•k Numeric predictor variables (k < n)•Model:Y = 0 + 1x1 + + kxk + • Partial Regression Coefficients: i effect (on the mean response) of increasing the ith predictor variable by 1 unit, holding all other predictors constant•Model Assumptions (Involving Error terms )–Normally distributed with mean 0–Constant Variance 2–Independent (Problematic when data are series in time/space)Example - Effect of Birth weight on Body Size in Early Adolescence• Response: Height at Early adolescence (n =250 cases)• Predictors (k=6 explanatory variables) • Adolescent Age (x1, in years -- 11-14)• Tanner stage (x2, units not given)• Gender (x3=1 if male, 0 if female)• Gestational age (x4, in weeks at birth)• Birth length (x5, units not given)• Birthweight Group (x6=1,...,6 <1500g (1), 1500-1999g(2), 2000-2499g(3), 2500-2999g(4), 3000-3499g(5), >3500g(6))Source: Falkner, et al (2004)Least Squares Estimation• Population Model for mean response: kkxxYE 110)(• Least Squares Fitted (predicted) equation, minimizing SSE:2^^11^0^^YYSSExxYkk• All statistical software packages/spreadsheets can compute least squares estimates and their standard errorsAnalysis of Variance •Direct extension to ANOVA based on simple linear regression•Only adjustments are to degrees of freedom:–DFR = k DFE = n-(k+1)Source ofVariationSum ofSquaresDegrees ofFreedomMeanSquare FModel SSR k MSR = SSR/k F = MSR/MSEError SSE n-(k+1) MSE = SSE/(n-(k+1))Total TSS n-1SSTSSRSSTSSETSSR 2Testing for the Overall Model - F-test•Tests whether any of the explanatory variables are associated with the response•H0: 1= =k=0 (None of the xs associated with y)•HA: Not all i = 0)(::..))1(/()1(/:..)1(,,22obsknkobsobsFFPvalPFFRRknRkRMSEMSRFSTExample - Effect of Birth weight on Body Size in Early Adolescence• Authors did not print ANOVA, but did provide following:• n=250 k=6 R2=0.26• H0: 1==6=0 HA: Not all i = 0)2.14(:13.2:..2.140030.0433.))16(250/()26.01(6/26.0))1(/()1(/:..243,6,22FPvalPFFRRknRkRMSEMSRFSTobsobsTesting Individual Partial Coefficients - t-tests•Wish to determine whether the response is associated with a single explanatory variable, after controlling for the others •H0: i = 0 HA: i 0 (2-sided alternative)|)|(2:||:..:..)1(,2/^^obsknobsbiobsttPvalPttRRstSTiExample - Effect of Birth weight on Body Size in Early AdolescenceVariable b SEbt=b/SEbP-val (z)Adolescent Age2.86 0.99 2.89 .0038Tanner Stage3.41 0.89 3.83 <.001Male0.08 1.26 0.06 .9522Gestational Age-0.11 0.21 -0.52 .6030Birth Length0.44 0.19 2.32 .0204Birth Wt Grp-0.78 0.64 -1.22 .2224Controlling for all other predictors, adolescent age, Tanner stage, and Birth length are associated with adolescent height measurementComparing Regression Models•Conflicting Goals: Explaining variation in Y while keeping model as simple as possible (parsimony)•We can test whether a subset of k-g predictors (including possibly cross-product terms) can be dropped from a model that contains the remaining g predictors. H0: g+1=…=k =0 –Complete Model: Contains all k predictors–Reduced Model: Eliminates the predictors from H0–Fit both models, obtaining sums of squares for each (or R2 from each): •Complete: SSRc , SSEc (Rc2) •Reduced: SSRr , SSEr (Rr2)Comparing Regression Models•H0: g+1=…=p = 0 (After removing the effects of X1,…,Xg, none of other predictors are associated with Y)•Ha: H0 is false )(:)]1([1)()]1(/[)/()( :TS))1((,,222obskngkobscrccrcobsFFPPFFRRknRgkRRknSSEgkSSRSSRFP-value based on F-distribution with k-g and n-(k+1) d.f.Models with Dummy Variables•Some models have both numeric and categorical explanatory variables (Recall gender in example)•If a categorical variable has m levels, need to create m-1 dummy variables that take on the values 1 if the level of interest is present, 0 otherwise.•The baseline level of the categorical variable is the one for which all m-1 dummy variables are set to 0•The regression coefficient corresponding to a dummy variable is the difference between the mean for that level and the mean for baseline group, controlling for all numeric predictorsExample - Deep Cervical Infections•Subjects - Patients with deep neck infections •Response (Y) - Length of Stay in hospital•Predictors: (One numeric, 11 Dichotomous)–Age (x1)–Gender (x2=1 if female, 0 if male)–Fever (x3=1 if Body Temp > 38C, 0 if not)–Neck swelling (x4=1 if Present, 0 if absent)–Neck Pain (x5=1 if Present, 0 if absent)–Trismus (x6=1 if Present, 0 if absent)–Underlying Disease (x7=1 if Present, 0 if absent)–Respiration Difficulty (x8=1 if Present, 0 if absent)–Complication (x9=1 if Present, 0 if absent)– WBC > 15000/mm3 (x10=1 if Present, 0 if absent)–CRP > 100g/ml (x11=1 if Present, 0 if absent)Source: Wang, et al (2003)Example - Weather and Spinal Patients•Subjects - Visitors to National Spinal Network in 23 cities Completing SF-36 Form•Response - Physical Function subscale (1 of 10 reported)•Predictors:–Patient’s age (x1)–Gender (x2=1 if female, 0 if male)–High temperature on day of visit (x3)–Low temperature on day of visit (x4)–Dew point (x5)–Wet bulb (x6)–Total precipitation (x7)–Barometric Pressure (x7)–Length of sunlight (x8)–Moon Phase (new, wax crescent, 1st Qtr, wax gibbous, full moon, wan gibbous, last Qtr, wan
View Full Document