Section 16 Linear constraints in multiple linear regression. Analysis of variance. Multiple linear regression with general linear constraints. Let us consider a multiple linear regression Y = X∂ + β and suppose that we want to test a hypothesis given by a set of s linear equations. In a matrix form: H0 : A∂ = c, where A is a s × p matrix and c is a s × 1 vector. We will assume that s � p and the matrix A has rank s. This generalizes two types of hypotheses from previous lecture, when we considered only one linear combination of parameters (s = 1 case) or tested hypothesis about all parameters simultaneously (s = p case). To test this general hypothesis, a natural idea is to compare how far A∂ˆis from c and to do this we need to find the distribution of A ˆ∂. Clearly, this distribution is normal with mean A∂ and covariance EA(∂ˆ− ∂)(∂ˆ− ∂)T AT = ACov( ∂ˆ)AT = χ2A(XT X)−1AT = χ2D where we introduced a notation D := A(XT X)−1AT . A matrix D is a symmetric positive definite invertible s × s matrix and, therefore, we can take its square root D1/2 . It is easy to check that the covariance of D−1/2A(∂ˆ− ∂) is χ2I. This implies that χ1 2 |D−1/2A(∂ˆ− ∂)|2 = χ1 2 (A(∂ˆ− ∂))T D−1A(∂ˆ− ∂) � λ2 s. Under null hypothesis, A∂ = c, we get 1 (A∂ˆ− c)T D−1(A∂ˆ− c) � λ2 s. (16.0.1)χ2 113| | � Since nχˆ2/χ2 � λ2 n−p ∂, we get is independent of ˆ1 � nχˆ2 sχ2 (A∂ˆ− c)T D−1(A∂ˆ− c)(n − p)χ2 = n − p(A∂ˆ− c)T D−1(A∂ˆ− c) � Fs,n−p. (16.0.2) nsχˆ2 This is enough to test hypothesis H0. However, in a variety of applications a different equiv-alent representation of (16.0.1) is more useful. It is given in terms of MLE ∂ˆA of ∂ that satisfies the constraint in H0. In other words, ∂ˆA is obtained by solving: Y − X∂2 � min subject to the constraint A∂ = c. (16.0.3) Lemma. If ∂ˆA is solution of (16.0.3) then the left hand side of (16.0.1) is equal to 1 |X(∂ˆA − ∂ˆ)| 2 . (16.0.4)χ2 Proof. First, let us find the constrained MLE ∂ˆA explicitly. By method of Lagrange multipliers we need to solve a system of two equations: � ⎟ ⎠ A∂ = c, �∂ |Y − X∂|2 + (A∂ − c)T � = 0, where � is a s × 1 vector. The second equation is −2XT Y + 2XT X∂ + AT � = 0. Solving this for ∂ gives ∂ˆA = (XT X)−1XT Y − 12(XT X)−1AT � = ∂ˆ− 12(XT X)−1AT �. Since ∂ˆA must satisfy the linear constraint, we get c = A∂ˆA = A∂ˆ− 1 1 D�. 2 A(XT X)−1AT � = A∂ˆ− 2 Solving this for �, � = 2D−1(A∂ˆ− c), we get ∂ˆA = ∂ˆ− (XT X)−1AT D−1(A∂ˆ− c). and, therefore, X(∂ˆA − ∂ˆ) = −X(XT X)−1AT D−1(A∂ˆ− c). We can use this formula to compute |X(∂ˆA − ∂ˆ)|2 = (X(A∂ˆ− ∂ˆ))T X(∂ˆA − ∂ˆ) = (A∂ˆ− c)T (X(XT X)−1AT D−1)T X(XT X)−1AT D−1(A∂ˆ− c) = (A∂ˆ− c)T D−1A(XT X)−1XT X(XT X)−1AT D−1(A∂ˆ− c). = (A∂ˆ− c)T D−1A(XT X)−1AT D−1(A∂ˆ− c) = (A∂ˆ− c)T D−1DD−1(A∂ˆ− c) = (A∂ˆ− c)T D−1(A∂ˆ− c). 114� Comparing with (16.0.1) proves Lemma. Using (16.0.2) and Lemma, we get that under null hypothesis H0: n − p |X(∂ˆA − ∂ˆ)|2 � Fs,n−p. (16.0.5) nsχˆ2 There are many different models that are special cases of a multiple linear regression and many hypotheses about these model can be written as a general linear constraints. We will describe one such model in detail - one-way layout in analysis of variance. Then we will describe a couple of other models without going into details since the idea will become clear. Analysis of variance: one-way layout. Suppose that we are given p independent samples Y11, . . . , Y1n1 � N(µ1, χ2) . . . Yp1, . . . , Ypnp � N(µp, χ2) of sizes n1, . . . , np correspondingly. We assume that the variance of all distributions are equal. We would like to test the hypothesis that the means of all distributions are equal, H0 : µ1 = . . . = µp. This problem is in fact a special case of a multiple linear regression and testing hypothesis given by linear equations. We can write Yki = µk + βki, where gki � N(0, χ2), for k = 1, . . . , p, i = 1 . . . , ni. Let us consider n × 1 vector, where n = n1 + . . . + np, Y = (Y11, . . . , Y1n1 , . . . , Yp1, . . . , Ypnp )T and p × 1 parameter vector µ = (µ1, . . . , µp)T . Then we can write all the equations in a matrix form Y = Xµ + β, where X is the following n × p matrix: ⎞ ������������������ 1 0 . . . 0 . . . . . . . . . . . . 1 0 . . . 0 0 1 . . . 0 . . . . . . . . . . . . 0 1 . . . 0 ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ X = . . . . . . . . . . . . . 0 0 . . . 1 . . . . . . . . . . . . 0 0 . . . 1 115��� � � ��� � � � � � � � � � � � The blocks have n1, . . . , np rows. Basically, the predictor matrix X consists of indicators to which group the observation belongs to. The hypothesis H0 can be written in a matrix form as Aµ = 0 for (p − 1) × p matrix ⎞ 1 0 . . . 0 −1 0 1 . . . 0 −.1⎜⎜⎜⎝ A = . . . . . . . . . . . . . . . 0 0 . . . 1 −1 We need to compute the statistic in (16.0.5) that will have distribution Fp−1,n−p. First of all, ⎞ n1 0 . . . 0 0 n2 . . . 0 . . . . . . . . . . . . 0 0 . . . nr ⎜⎜⎜⎝ XT X = . Since ˆµ = (XT X)−1XT Y it is easy to see that for each i � p, 1 ni �¯ µˆi = Yik = Yi - the average of ith sample. ni k=1 We also get pn i=1 k=1 ni1 1 (Yik − Y¯i)2 . χˆ2 2 = n|Y − Xµ| = To find the MLE µˆA under the linear constraints Aµ = 0 we simply need to minimize |Y − Xµ|2 over vectors µ = (µ1, . . . , µ1)T with all equal coordinates. But, obviously, Xµ is a vector (µ1, . . . , µ1)T of size n × 1, so we need to minimize pni (Yik − µ1)2 …
View Full Document