DOC PREVIEW
Multiple Testing in Large-Scale Contingency Tables:

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple Testing in Large-Scale Contingency Tables:Inferring Pair-Wise Amino Acid Patterns in β-SheetsSeoung Bum Kim and Kwok-Leung TsuiSchool of Industrial and Systems EngineeringGeorgia Institute of TechnologyAtlanta, GA 30332, USAE-mail: [email protected] E-mail: [email protected] BorodovskySchools of Biology and Biomedical EngineeringGeorgia Institute of TechnologyAtlanta, GA 30332, USAE-mail: [email protected] 12, 2005ABSTRACTOne of the most common test procedures using two-way contingency tables is a test of independencebetween two categorizations. Current significant tests such as χ2tests or likelihood ratio testsprovide overall independency but bring limited information about the nature of the association inthe contingency tables. This study examines the feasibility of using multiple testing procedures foran inference of independence of categories in each cell in contingency tables. In the simulation study,we compare the performance of various multiple testing procedures in a contingency table setupand demonstrate the relationship among the proportion of true null hypothesis, type I error, power,and false discovery rate. Finally, we apply the proposed methodology to identify the patterns ofpair-wise associations of amino acids involved in β-sheet bridges in proteins. We identify a numberof amino acid pairs that exhibit either strong or weak asso ciation. These patterns provide usefulinformation for algorithms for predicting the secondary as well as tertiary structure of proteins.Keywords: β-strands; Contingency table; False discovery rate; Multiple testing.11. IntroductionOne of the most common test procedures applied to two-way contingency tables is a test of inde-pendence (or association) between two categorizations. In general, the test of independence usesχ2tests or likelihood ratio tests that can be called “globally significant tests.” The basic idea ofthese tests is as follows: If the sum of all the differences between observed and expected frequenciesof all cells in a contingency table is small in a statistical sense, independence between two catego-rizations is accepted; if the sum of the differences is large, independence is rejected. However, theglobal tests can hardly identify the independence of individual cells in a contingency table sincetheir statistics are constructed based on all cells. The issue of identification of independence inindividual cells is especially important in large-scale contingency tables where the number of cellsÀ 4. Agresti (2002) pointed out several limitations of the global tests. He reviewed follow-upmethods to global tests such as a partitioning of the χ2method as well as a method based on stan-dardized and adjusted residual that allows further investigation of the associations in contingencytables. Partitioning of χ2is a method for exploring the associations by dividing the large tablesinto smaller ones. Lancaster (1949) showed that any r × c table can be reduced to (r-1) × (c-1)independent 2×2 tables. Hence, the interpretation of small tables is straightforward. In large-scalecontingency tables, however, this method becomes too complicated as it generates too many 2 × 2tables. For example, a 10 × 10 table produces 81 tables of 2 × 2 size, which makes the extractionof meaningful information cumbersome. Haberman (1973) defined the Standardized and AdjustedResidual (STAR) statistic for each cell and showed that this statistic is asymptotically standardnormal under the null hypothesis of independence of category in individual cells. Therefore, theSTAR statistics that are greater or less than a certain threshold indicate lack of fit to the nulldistribution in that cell (Agresti, 2002). The STAR method is simple but does not provide anobjective way to determine a threshold since the threshold depends upon the number of degrees offreedom in a contingency table. Also, under the simultaneous consideration of all cell in contin-gency tables, the STAR method produces many false positives (Agresti, 2002). Another method2was also introduced by Haberman (1973), who utilized a normal probability plot of STAR valuesthat provides a nice graphical representation. However, the interpretation of a normal probabilityplot is frequently subjective, particulary when the number of cells to be tested is large. Therefore,there is a need for a method able to systematically and objectively identify the independence ofeach cell in contingency tables. In this study, we propose a procedure for testing independence ofcategories in individual cells of a contingency table based on a multiple testing framework.In multiple testing problems, family-wise error rates have been used under simultaneous con-sideration to avoid the multiplicity effect. Applying the single testing procedure to the multipletesting problem leads to an exponential increase of false positive rates. More precisely, the proba-bility that at least one of the tests leads to rejection of H0when H0holds increases exponentiallywith the number of hypotheses. A convenient new definition of error rate, called False DiscoveryRate (FDR) was proposed by Bejamini and Hochberg (1995). The FDR is the expected propor-tion of false positives among all the hypotheses rejected. The FDR has been used for microarrayanalysis to find co-expressed genes (Tusher et al. (2001), Efron et al. (2001), Efron and Tibshirani(2002), Dudoit et al. (2003)) as well as the genetic study to identify drugs causing mutations in theviral genome (Efron, 2004). As an extension of original FDR, Storey (2002, 2003) and Storey et al.(2004) introduced the positive False Discovery Rate (pFDR) and Efron et al. (2001) proposed theLocal False Discovery Rate (Local FDR). Moreover, the case when the hypotheses are dependentwas considered by Yekutieli and Benjamini (1999) and Benjamini and Yekutieli (2001).We first review some of the multiple testing procedures and presents its application to thestatistical inference of individual cells in contingency tables, the main topic of this paper. Inaddition, we perform simulation studies to compare the proportion of true null hypothesis, typeI error, power, and FDR of different multiple testing procedures in contingency tables. Finally,the proposed procedure is applied to identify the patterns of pair-wise associations of amino acidsinvolved in β-sheet bridges.32. Control Procedures in Multiple Testing2.1 The Family-Wise Error RateIn a multiple hypothesis test,


Multiple Testing in Large-Scale Contingency Tables:

Download Multiple Testing in Large-Scale Contingency Tables:
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple Testing in Large-Scale Contingency Tables: and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Testing in Large-Scale Contingency Tables: 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?