PSU STAT 544 - Analysis of Three Way Tables

Unformatted text preview:

Stat 544, Lecture 7 1'&$%Analysis ofThree-Way TablesIntroducing three-way tablesSuppose that we have three categorical variables, A,B, and C, whereA takes possible values 1, 2,...,I,B takes possible values 1, 2,...,J,C takes possible values 1, 2,...,K.If we collect the triplet (A, B, C) for each unit in asample of n units, then the data can be summarizedas a three-dimensional table. Let nijkbe the numberof units having A = i, B = j, and C = k. Then thevector of cell counts (n111,n112,...,nIJK)Tcan bearranged into a table whose dimensions areI × J × K. As before, we will use “+” to indicatesummation over a subscript; for example,ni+k=JXj=1nijk,Stat 544, Lecture 7 2'&$%n++k=IXi=1JXj=1nijk.To display this table, we must use a set of two-waytables. For example, the A × B × C table could bedisplayed by showing B × C tables for each level of A.If the n units in the sample are independently drawnfrom the same population, then the vector of cellcounts x has a multinomial distribution with index nand parameterπ =(π111,π112,...,πIJK)T.Under the unrestricted (saturated) multinomialmodel, there are no constraints on π other thanπ+++= 1, and the ML estimates are ˆπijk= nijk/n.The saturated model always fits the data perfectly;the estimated expected frequency nˆπijkequals theobserved frequency nijkfor every cell, yieldingX2= G2= 0 with zero df. Fitting a saturated modelmight not reveal any special structure that may existin the relationships among A, B, and C.Toinvestigate these relationships, we will proposesimpler models and perform tests to see whether thesesimpler models fit the data.Stat 544, Lecture 7 3'&$%Mutual independence. The simplest model thatone might propose isP (A = i, B = j, C = k)=P (A = i) P (B = j) P (C = k)for all i, j, k.Defineαi= P (A = i),i=1, 2,...,I,βj= P (B = j),j=1, 2,...,J,γk= P (C = k),k=1, 2,...,K.so that πijk= αiβjγkfor all i, j, k. The unknownparameters areα =(α1,α2,...,αI),β =(β1,β2,...,βJ),γ =(γ1,γ2,...,γK).Because each of these vectors must add up to one, thenumber of free parameters in the model is(I − 1) + (J − 1) + (K − 1).Notice that under the independence model,(n1++,n2++,...,nI++) ∼ Mult(n, α),(n+1+,n+2+,...,n+J+) ∼ Mult(n, β),Stat 544, Lecture 7 4'&$%(n++1,n++2,...,n++K) ∼ Mult(n, γ),and these three vectors are mutually independent.Thus the three parameter vectors α, β, and γ can beestimated independently of one another.The ML estimates are given byˆαi= ni++/n, i =1, 2,...,I,ˆβj= n+j+/n, j =1, 2,...,J,ˆγk= n++k/n, k =1, 2,...,K.It follows that the estimates of the expected cellcounts areEijk= n ˆαiˆβjˆγk=ni++n+j+n++kn2.To test the null hypothesis of full independenceagainst the alternative of the saturated model, wecalculate the expected counts Eijkand find X2or G2in the usual manner,X2=XiXjXk(Eijk− nijk)2Eijk,G2=2XiXjXknijklog„nijkEijk«.Stat 544, Lecture 7 5'&$%The degrees of freedom for this test areν =(IJK − 1) − [(I − 1) + (J − 1) + (K − 1) ].Graphically, we can express the model of completeindependence as:ABCIn this graph, the lack of connections between thenodes indicates no relationships exist among A, B,and C. In the notation of loglinear models, this modelis expressed as (A, B, C).In terms of odds ratios, the model (A, B, C) impliesthat if we look at the marginal tables A × B, B × C,and A × C, that all of the odds ratios in thesemarginal tables are equal to 1.Two variables independent of a third. ThemodelABCStat 544, Lecture 7 6'&$%indicates that A and B are jointly independent of C.The line linking A and B indicates that A and B arepossibly related, but not necessarily so. Therefore, themodel of complete independence is a special case ofthis one. In loglinear notation, this model is (AB, C).If the model of complete independence (A, B, C) fits adata set, then the model (AB, C) will also fit, as will(AC, B) and (BC, A). In that case, we will prefer touse (A, B, C) because it is more parsimonious. Ourgoal is to find the simplest model that fits the data.Under (AB, C), the probabilities satisfyπijk= P (A = i, B = j) P (C = k)= αβijγk,wherePiPjαβij= 1 andPkγk= 1. The numberof free parameters is (IJ − 1) + (K − 1), and their MLestimates areˆαβij= nij+/n,ˆγk= n++k/n. Theestimated expected frequencies areˆEijk=nij+n++kn.Notice the similarity between this formula and theone for the model of independence in a two-way table,ˆEij= ni+n+j/n. If we view A and B as a singleStat 544, Lecture 7 7'&$%categorical variable with IJ levels, then thegoodness-of-fit test for (AB, C) is equivalent to thetest of independence between the combined variableAB and C.Conditional independence. The model (AB, AC),ABCindicates that A and B may be related; A and C maybe related; and that B and C may be related, butonly through their mutual associations with A.Inother words, any relationship between B and C canbe “fully explained” by A.In terms of odds ratios, this model implies that if welook at the B × C tables at each level of A =1,...,I,that the odds ratios in these tables are notsignificantly different from 1.Notice that the odds ratios in the marginal B × Ctable, collapsed or summed over A, are not necessarily1. The conditional BC odds ratios at the levels ofA =1,...,I can be quite different from the marginalStat 544, Lecture 7 8'&$%odds ratios. In extreme cases, the marginalrelationship between B and C can be in the oppositedirection from their conditional relationship given A;this is known as Simpson’s paradox.Under the conditional independence model, theprobabilities can be written asπijk= P (A = i) P (B = j, C = k|A = i)= P (A = i) P (B = j|A = i) P (C = k|A = i)= αiβj(i)γk (i),wherePiαi=1,Pjβj(i)= 1 for each i, andPkγk (i)= 1 for each i. The number of freeparameters is(I − 1) + I(J − 1) + I(K − 1).The ML estimates of these parameters areˆαi= ni++/n,ˆβj(i)= nij+/ni++,ˆγk (i)= ni+k/ni++.The estimated expected frequencies areˆEijk=nij+ni+kni++.Notice again the similarity to the formula forindependence in a two-way table. The test forconditional independence of B and C given A isStat 544, Lecture 7 9'&$%equivalent to separating the table by levels ofA =1,...,I, and testing for independence withineach level. The overall X2or G2statistics are foundby summing the individual test statistics for BCindependence given A. The total degrees of freedomfor this test must be I(J − 1)(K − 1).Homogeneous association. If we take the


View Full Document

PSU STAT 544 - Analysis of Three Way Tables

Download Analysis of Three Way Tables
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Analysis of Three Way Tables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Analysis of Three Way Tables 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?