Chapter ThirteenInference about Comparing Two Populations721Inference about the Difference between Two Means• In Chapter 9 (Sampling Distributions) we studied the problem of comparing the mean of two populations. • For example, suppose we want to compare the aver age income for college graduates and college dropouts. Let• X1= Income for a college graduate• X2= Income for a college dropout• And let and • Suppose the parameter of interest to us is 722• Suppose we have an iid sample for X1:i = 1,…,n1(the sample size for X1is denoted by n1).• And we have a separate iid sample for X2:i = 1,…,n2(the sample size for X2is denoted by n2).• The sample sizes can be different. That is, we may have: • Finally, assume that both samples are independent of each other.723• The corresponding sample means are:and • Then, the statistic of interest is the difference between these means. That is, • What is the expected value of ? We already know that =and =• Therefore, 724• Next, what is the variance of ?Independence between both samples means that • Next, recall from our previous lectures that, if Z1and Z2are two random variables with zero covariance, then • Therefore, denoting V(X1)=1and V(X2)=2, 12725• In summary, and 12• What about the sampling distribution of ?• The same type of results for a single mean extend to this case:• Case 1.‐ If X1 and X2 are both Normally distributed, then is exactly Normally distributed as 12• Case 2.‐ Otherwise, by the Centra l Limit Theorem, the above distribution holds approximately, and this approximation is more accurate if n1and n2are relatively large.726• Thus, if both 1and 2were known, all inference on would be based on the statistic 12which would be either exactly distributed as a Standard Normal (if X is Normally distribute d), or approximately dist ributed as a Standard Normal (by virtue of the Central Limit Theorem).727• Here we focus on the more realistic case where both 1and 2are unknown.• In this setting, the construction of the test‐statistic and its distribution depend on two possible cases:• Case I: 12• Case II: 12• We examine each case separately.728Case I: Inference for when 12• If we maintain that 12, inference on is based on the following statistic: sp where sps1s2 • spis called the pooled variance estimator. It is valid if 1 2.• If both X1and X2are Normally distributed, then the statistic t described above is distributed as a Student trandom variable with degrees of freedom. 729• As we have done previously, we will maintain this distribution as the approximate distribution for t even if X is not Normally distributed, keeping in mind that it would hold only approximately, and that the accuracy of this approximation depends on how much the distribution of X differs from Normal, and on the sample size.730• From here, if 12, a Confidence Interval for with coverage probability is given by ,∙sp∙11 , ,∙sp∙11where 731• Hypothesis testing is done as before:• Fix a significance level and let • Our rejection rules are:• Reject H0: µ1 ‐ µ2= µ1*‐ µ2* in favor of H1: µ1 ‐ µ2> µ1*‐ µ2* ift > tα,• Reject H0: µ1 ‐ µ2= µ1*‐ µ2* in favor of H1: µ1 ‐ µ2< µ1*‐ µ2* ift <‐ tα,• Reject H0: µ1 ‐ µ2= µ1*‐ µ2* in favor of H1: µ1 ‐ µ2≠ µ1*‐ µ2* if|t| > tα/2,732• P‐values are also obtained as previously…• Let T be a t‐random variable with degrees of freedom given by: • And let ‘t’ be the value obtained for our test‐statistic in the data observed. Then: • If H0: µ1 ‐ µ2= µ1*‐ µ2* vs. H1: µ1 ‐ µ2> µ1*‐ µ2*. p‐value = • If H0: µ1 ‐ µ2= µ1*‐ µ2* vs. H1: µ1 ‐ µ2< µ1*‐ µ2*. p‐value = • If H0: µ1 ‐ µ2= µ1*‐ µ2* vs. H1: µ1 ‐ µ2≠ µ1*‐ µ2*. p‐value733Case II: Inference for when 12• If 12, we employ a different formula for our test‐statistic. Let s1s2now, even if X is Normally distributed, t will not be exactly Student‐t distributed. However, it is approximately distributed as a Student –t with degrees of freedom given by:s1s2s1s2734• From here , if 12, a Confidence Interval for with coverage probability is given by ⁄,∙s1s2,
View Full Document