Chapter 15 Comparing Two Binomial Populations In this chapter and the next our data are presented in a 2 2 contingency table As you will learn however not all 2 2 contingency tables are analyzed the same way Thus I begin with an introductory section 15 1 The Ubiquitous 2 2 Table I have always liked the word ubiquitous and who can argue taste According to dictionary com the definition of ubiquitous is Existing or being everywhere especially at the same time omnipresent Table 15 1 is a partial recreation of Table 8 5 on page 170 in Chapter 8 of these notes Let me explain why I refer to this table as ubiquitous You will learn of many different scientific scenarios that yield data of the form presented in Table 15 1 Depending on the scenario you will learn the different appropriate ways to summarize and analyze the data in this table I say that Table 15 1 is only a partial recreation of Chapter 8 s table because of the following changes 1 In Chapter 8 the rows were treatments 1 and 2 in Table 15 1 they are simply called rows 1 and 2 In some scenarios these rows will represent treatments and in some scenarios they won t 2 In Chapter 8 the columns were the two possible responses success and failure in Table 15 1 they are simply called columns 1 and 2 In some scenarios these columns will represent the response and in some scenarios they won t 3 The table in Chapter 8 also included row proportions that served two purposes they described summarized the data and they were the basis via the computation of x p 1 p 2 for finding the observed value of the test statistic X for Fisher s test Again in some scenarios in this and the next chapter we will compute row proportions and in some scenarios we won t 353 Table 15 1 The general notation for the ubiquitous 2 2 contingency table of data Column Row 1 2 1 a b 2 c d Total m1 m2 Total n1 n2 n Here are the main features to note about the ubiquitous 2 2 contingency table of data The values a b c and d are called the cell counts They are necessarily nonnegative integers The values n1 and n2 are the row totals of the cell counts The values m1 and m2 are the column totals of the cell counts The value n is the sum of the four cell counts alternatively it is the sum of the row column totals Thus there are nine counts in the 2 2 contingency table all of which are determined by the four cell counts 15 2 Comparing Two Populations the Four Types of Studies The first appearance of the ubiquitous 2 2 contingency table was in Chapter 8 for a CRD with a dichotomous response Recall that Fisher s Test is used to evaluate the Skeptic s Argument As stated at the beginning of this Part II of these notes a limitation of the Skeptic s Argument is that it is concerned only with the units under study In this section you will learn how to extend the results of Chapter 8 to populations In addition we will extend results to observational studies that as you may recall from Chapter 1 do not involve randomization In Chapter 8 the units can be trials or subjects The listing below summaries the studies of Chapter 8 Units are subjects The infidelity study the prisoner study and the artificial Headache Study 2 Units are trials The golf putting study The idea of a population depends on the type of unit In particular When units are subjects we have a finite population The members of the finite population comprise all potential subjects of interest to the researcher When units are trials we assume that they are Bernoulli Trials 354 The number four in the title of this section is obtained by multiplying 2 by 2 When we compare two populations both populations can be Bernoulli trials or both can be finite populations In addition as we shall discuss soon a study can be observational or experimental Combining these two dichotomies we get four types of study for example an observational study on finite populations It turns out that the mathematical formulas are identical for the four types of studies but the interpretation of our analysis depends on the type of study We begin with an observational study on two finite populations This is a real study that was published in 1988 1 Example 15 1 The Dating study The first finite population is undergraduate men at at the University of Wisconsin Madison and the second population is undergraduate men at Texas A M University Each man s response is his answer to the following question If a woman is interested in dating you do you generally prefer for her to ask you out to hint that she wants to go out with you or to wait for you to act The response ask is labeled a success and either of the other responses is labeled a failure The purpose of the study is to compare the proportion of successes at Wisconsin with the proportion of successes at Texas A M These two populations obviously fit our definition of finite populations Why is it called observational The dichotomy of observational experimental refers to the control available to the researcher Suppose that Matt is a member of one of these populations As a researcher I have control over whether I have Matt in my study but I do not have control over the population to which he belongs Consistent with our usage in Chapter 1 the variable that determines a subject s population is called the study factor In the current example the study factor is school attended and it has two levels Wisconsin and Texas A M This is an observational factor sometimes called for obvious reasons a classification factor because each subject is classified according to his school Table 15 2 presents the data from this Dating Study Please note the following decisions that I made in creating this table 1 Similar to our tables in Chapter 8 the columns are for the response and the rows are for the levels of the study factor i e the populations Note that because the researcher did not assign men to university by randomization we do not refer to the rows as treatments 2 As in Chapter 8 I find the row proportions to be of interest In particular we see that 56 of the Wisconsin men sampled are successes compared to only 31 of the Texas men Next we have an experimental study on two finite populations Below is my slight alteration of an actual study of Crohn s disease that was published in 1988 2 I have taken the original sample sizes 37 and 34 and made them both 40 My values of the p s differ from the original ones by less than 0 005 Why did I make these changes 1 I can t believe that a 25 year …
View Full Document