Chapter 2: Inferences in Regression AnalysisAs X changes, E(Y) does not change. X is not linearly related to Y.a)Chapter 2: Inferences in Regression AnalysisPopulation model: Yi=0+1Xi+i where 0 and 1 are parametersXi are known constantsi ~ independent N(0,2)2.1 Inferences concerning 1Is X linearly related to Y? Perform a hypothesis test to answer these questions.Suppose 1 = 0Population Model: Yi = 0 + 1Xi + iThen Yi = 0 + 0*Xi + i= 0 + i Example plot: Suppose 0=3. 2012 Christopher R. Bilder2.101234560 2 4 6 8XYAs X changes, E(Y) does not change. X is not linearly related to Y.Use b1 to determine if 1 = 0 using a hypothesis test.How? Use b1’s sampling distribution If we repeated the process of taking a sample from the population an infinite amount of times (calculating b1 each time), the average of the b1’s would be 1, the variance of the b1’s would be2 2(X X) , and b1~N(1,2 2(X X) ). i.e., E(b1)=1 and Var(b1)=2(b1)=2 2(X X) (NOTE: Different notation for variance!) 2012 Christopher R. Bilder2.2Proof: We already saw that 1 1E(b ) =b in Chapter 1. Let’s examine the variance now. Remember that ni ii 11n2ii 1(X X)(Y Y)b(X X)==- -�=-�. Also, note that the numerator can be simplified to ben ni i i i ii 1 i 1n ni i ii 1 i 1(X X)(Y Y) (X X)Y (X X)Y(X X)Y (X X)Y= == =� �- - = - - -� �� �= - - -� �( )n ni i ii 1 i 1n ni i ii 1 i 1(X X)Y Y (X X)(X X)Y Y X nX= == =� � � �= - - -� �� � � �� � � �� �� �= - - -� �� �� �� �� �ni ii 1ni ii 1(X X)Y Y 0(X X)Y==� �= - - *�� �� �� �= -�� �� �Then 2012 Christopher R. Bilder2.3( )ni ii 11n2ii 1ni i2ni 12ii 1n2i i2ni 12ii 1(X X)YVar(b ) Var(X X)1Var (X X)Y(X X)1(X X) Var(Y )(X X)======� �-�� �=� �-�� �� �= -�� �-�� �� �= -�� �-�� �� �n2 2i2ni 12ii 12n2ii 11(X X)(X X)(X X)==== - s�� �-�� �� �s=-�Remember that 2i i i iVar( a Y ) a Var(Y ) (see p. 646 equation A.3 or Chapter 4 of my STAT 380 notes)Notice that the variance of b1 has a parameter in it - 2. To find the estimated variance of b1, 2 is replaced by its estimate – MSEn2 21 1 ii 1Var(b ) s (b ) MSE (X X)�== = -� 2012 Christopher R. Bilder2.4Sampling distribution of 1 1 1t (b ) Var(b )�*= - bPurpose: Find a test statistic for a test of 1=0Note: The standardized quantity, 1 1 1(b ) Var(b )- b, is distributed as N(0,1) random variable. Quick Review - A standardized quantity is: statistic E(statistic)Var(statistic)- Since 2 unknown, we generally can not use this quantity for hypothesis testing. Use the “studentized” version of the above quantity: 1 1 1t (b ) Var(b )�*= - bsince this contains no unknown parameters (1 will be specified in the hypothesis test). t*~t(n-2) where t(n-2) represents a random variable with a student t-distribution (or just t-distribution) withn-2 degrees of freedom. 2012 Christopher R. Bilder2.5For a proof, see p. 45 (or 11.44 of my STAT 380 Chapter11 notes at www.chrisbilder.com/stat380/schedule.htm) Hypothesis Test for 1=0:1) State H0 and HaH0: 1=0 (no linear relationship)Ha: 10 (linear relationship)2) Test statistic: 1 1 1t (b ) Var(b )�*= - b3)Critical value: t(1-/2; n-2) where is the type I error level; note that this is the 1-/2 quantile from a t-distribution with n-2 degrees of freedom. 4)Reject or don’t reject H0 T DistributiontProbabilityDon't Reject HoCritical ValueReject HoReject Ho0Critical Value5)ConclusionReject H0 – X is linearly related to YDon’t Reject H0 – There is not sufficient evidence to show that X is linearly related to Y 2012 Christopher R. Bilder2.6where ____ means to put in what X and Y are in the problemUsing a p-value: 1) Same2) State p-value: p-value = 2*P( t(n-2) > |t*|) where t(n-2) denotes a t random variable with n-2 degress of freedomT DistributiontProbability0|t*|Note: The p-value gives the probability of findinga value of |t*| at least this great assuming the nullhypothesis is true.3) State 4) Reject H0 if p-value Don’t reject H0 if p-value > 5) SameHow can you find p-values and critical values in R for thet-distribution? 2012 Christopher R. Bilder2.7In the summary() output for an object resulting from using lm(), the p-value for a test of H0: 1=0 vs. Ha: 10 will be given. The more general way to find it is using the pt() function which finds the probability of random variable t(n-2) is less than a particular value. For example, if the test statistic was 1.96 with 10 degrees of freedom, we obtain P(t(n-2) < 1.96), > pt(q = 1.96, df = 10)[1] 0.9607819Since the p-value is 2P( t(n-2) > |t*|), we can change this to > 2*(1-pt(q = abs(1.96), df = 10))[1] 0.07843624To find a critical value, use the qt() function,> qt(p = 0.95, df = 10)[1] 1.812461Thus, P( t(n-2) < 1.81) = 0.95; i.e., 1.81 is the 0.95 quantile from a t-distribution with 10 degrees of freedom. (1- )100% confidence interval (C.I.) for 1: The “usual” type of t-distribution based confidence interval is: Estimator t(1-/2, df)*(S.E. of estimator) 2012 Christopher R. Bilder2.8where df=degrees of freedom and S.E. is standard errorFor 1: 1 1b t(1 / 2,n 2) * Var(b )�� - a -This can be rewritten as: 1 1 1 1 1b t(1 / 2,n 2) Var(b ) b t(1 / 2,n 2) Var(b )� �- - a - <b < + - a -See p. 2.20 for a review of how knowing the distribution of t* can be used to find a C.I. Example: Sales and advertisingIs advertising linearly related to sales? Use =0.051) H0: 1=0 Ha: 102) 1 1 1t (b ) Var(b )�*= - b = 0.70/0.1915 = 3.6556X Y (X-X)21 1 42 1 13 2 04 2 15 4 410Note: 2012 Christopher R. Bilder2.9n21 ii 1Var(b ) MSE (X X) 0.3667 /10 0.1915�== - = =�3) t(1-0.05/2; 5-2) = t(0.975;3)= 3.182> qt(p = 1-0.05/2, df = 3)[1] 3.1824464) Since 3.6556 > 3.182 reject H0.T DistributiontProbabilityDon't Reject Ho-3.1823.182Reject HoReject Ho03.665) Advertising is linearly related to sales.See t-dist_program.R for an R program that shows how to draw a plot similar to the one above. Example: HS and College GPA (HS_college_GPA_ch2.R)Suppose = 0.01 for the hypothesis test. > #Fit the simple linear regression model and save the 2012 Christopher R. Bilder2.10results in mod.fit> mod.fit<-lm(formula = College.GPA ~
View Full Document