Berkeley PUBPOL 279 - Some Practical Guidelines for Effective Sample Size Determination

Unformatted text preview:

Some Practical Guidelines for Effective Sample SizeDeterminationRussell V. LENTHSample size determ inationis often an important step in planninga statistical study —and it is usuall y a dif cul t one. Among theimport ant hurdles to be surp assed, one must obtain an estimateof one or more error variances and speci fy an effect size ofimport ance.There is t he temptation to take some shortcu ts. Thisarticle offers some suggestions for successful and meaningfulsample size det ermination. Also discussed is the possibility thatsample size may not be the main issue, that the real goal is todesig n a high-quality study. Final ly, criticism is made of someill-ad vised shortcuts rel ating to power and samp le size.KEY WORDS : Cohen ’s effect measures; Equivalence test-ing; Observed power; Power; Retrospective power; Study de-sign.1. SAMPLE SIZE AND POWERStatisti cal studies (surveys, exp eriments, observational stud-ies, et c.) are always better when they are care fully planned. Goodplann inghas many aspects. The prob lem should be carefully de- ned and operationalized. Experimental or observational unitsmust be selected from the appropriate population. The stu dymust be ra ndomized correctly. The procedures must be followedcarefull y. Reliable instruments should be used to obtain mea-suremen ts.Finally, t he study must be of adequate size, relative to thegoals of the st udy. It must be “big enough” that an effect of su chmagni tude as to be of scienti c signi cance will also be statisti-cally signi cant. It is just as important, however, that the studynot be “too big,” where an effect of littl e scienti c import anceis nevertheless statistically detectable. Sample size is importantfor economic rea sons: An undersized study can be a waste ofresource s for not having the capability to produce useful results,while an oversized one uses more resources than are necessary.In an experiment involving human or animal subjects, samplesize is a pivotal issue for ethical reaso ns. An undersized ex-perimen t exposes the subjects to potentially harmful treatmentswitho ut advanci ng knowledge. In an oversized experiment, anunne cessary number of subj ects are exposed to a potentiallyharmful treatment, or are denied a potentially bene cial one.For such an important issue, there is a surprisingly smallamoun t of published literature. Important general referencesRussell V. Lenth is Associate Professor, Department of Statistics andActuarial Science, University of Iowa, Iowa City, IA 52242 (E-mail:[email protected]). The author thanks John Castelloe, Kate Cowles,Steve Simon, two referees, the edit or, and an associate editor for their help-ful comments o n earlier drafts of this article. Much of this work was done withthe support of the Obermann Center for Advanced Studies at the University ofIowa.inclu de Mace (1964), Kraemer and Thiemann (1987 ), Cohen(1988 ), Desu and Raghavarao (1990), Lipsey (1990), Shuster(1990 ), and Odeh and Fox (1991). There are numerous arti-cles, especially in biostatistics journals, concerning sample sizedetermi nation for speci c tests. Also of interest are studies ofthe extent to whi ch sample size is adequate or inadequate inpubl ished studies; see Freiman, Chalmers, Smith, and Kuebler(1986 ) and Th ornley and Adams (1998 ). There is a growingamoun t of software for sample size de termination, includingnQuery Advisor(Elasho ff 2000),PASS(Hintze 2000),Uni-fyPow(O’Brien 1998), andPower and Precision(Borenst ein,Roth stein, and Cohen 1997). Web resources include a compre-hensive list of power-analysis software (Thomas 1998) and on-line calculators such as Lenth (2000). Wheeler (1974) providedsome useful approximations for use in linear models; C astelloe(2000 ) gave an up-to-date overview of computational methods.There are several approaches to sample size. For example,one can specify the desired width of a con dence interval anddetermi ne the sample size that achi eves that goal; or a Bayesianapproa ch can be used where we optimize so me utilityfunction—perhap s one that involves both precision of estimation and cost.One of the most popular app roaches to sample size determina-tion invol ves stud ying the power of a test of hypothesis. It is theapproa ch emp hasized here, although much of the disc ussion isappli cable in other contexts. The power approach involves theseelemen ts:1. Specify a hypothesis test on a parameter ³ (alo ng with theunde rlying probability model for the data).2. Specify the signi cance level ¬ of the test.3. Specify aneffect size~³ that reects an alternative of sc i-enti c interest.4. Obtai n historical values or esti mates of other parametersneede d to compute the power function of the test.5. Specify a target value ~º of the power of the test when³ =~³ .Notat ionally, the power of the test is a function º (³ ; n; ¬ ; : : :),where n is the sample size and the “: : : ” part refers to the ad-ditio nal parameters mentioned in Step 4. The required samplesize is the smallest integer n such that º (~³ ; n; ¬ ; : : :)¶~º .1.1 ExampleTo illustrate, sup pose tha t we plan to conduct a simple two-sample experiment comparing a treatment with a control. Therespon se variable is systolic blood pressure (SBP), measuredusing a standard sphygmomanometer. The treatment is supposedto reduce blood pressure; so we set up a one-sided test of H0:·T= ·Cversus H1: ·T< ·C, where ·Tis the mean SBPfor the treatment group and ·Cis the mean SBP for the controlgroup . Here, the parameter ³ = ·T¡·Cis the effect beingtested ; thus, in the ab ove framework we would write H0: ³ = 0and H1: ³ < 0.c®2001 American Statistical Association The American Statistician, August 2001, Vol. 55, No. 3 187Figure 1. Software solution (Java a pplet in Lenth 2000) to the sample size problem in the blood-pressure example.The goals of the experiment specify that we want to be ableto detect a situation where the treatment mean is 15 mm Hglower than the control group; that is, the required effect size is~³ =¡15. We specify that such an effect be detected with 80%power (~º = :80) when the signi cance leve l is ¬ = :05. Pastexperie nce with similar experiments—with similar sphygmo-manom eters and similar subjects—suggests that the data will beappro ximately normally distri buted with a stand ard deviation of¼ = 20 mm Hg. We plan to use a two-sample pooled t test withequal numbers n of subjects in each group.Now


View Full Document

Berkeley PUBPOL 279 - Some Practical Guidelines for Effective Sample Size Determination

Download Some Practical Guidelines for Effective Sample Size Determination
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Some Practical Guidelines for Effective Sample Size Determination and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Some Practical Guidelines for Effective Sample Size Determination 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?