STATISTICS IN MEDICINEStatist. Med. 2006; 25:917–932Published online 11 October 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2251Ecient group sequential designs when there are several eectsizes under considerationChristopher Jennison1and Bruce W. Turnbull2; ∗; †1Department of Mathematical Sciences; University of Bath; Bath; BA2 7AY; U.K.2School of Operations Research and Industrial Engineering; Cornell University; Ithaca; NY 14853; U.S.A.SUMMARYWe consider the construction of ecient group sequential designs where the goal is a low expectedsample size not only at the null hypothesis and the alternative (taken to be the minimal clinicallymeaningful eect size), but also at more optimistic anticipated eect sizes. Pre-speci ed Type I errorrate and power requirements can be achieved both by standard group sequential tests and by morerecently proposed adaptive procedures. We investigate four nested classes of designs: (A) group se-quential tests with equal group sizes and stopping boundaries determined by a monomial error spendingfunction (the ‘-family’); (B) as A but the initial group size is allowed to be dierent from the others;(C) group sequential tests with arbitrary group sizes and arbitrary boundaries, xed in advance;(D) adaptive tests—as C but at each analysis, future group sizes and critical values are updated de-pending on the current value of the test statistic. By examining the performance of optimal procedureswithin each class, we conclude that class B provides simple and ecient designs with eciency closeto that of the more complex designs of classes C and D. We provide tables and gures illustrating theperformances of optimal designs within each class and de ning the optimal procedures of classes Aand B. Copyright? 2005 John Wiley & Sons, Ltd.KEY WORDS: clinical trial; group sequential test; sample size re-estimation; adaptive design; exibledesign; optimal design; error spending function1. INTRODUCTIONAlong with practical considerations, the sample size for a clinical trial is determined by settingup null and alternate hypotheses concerning a primary parameter of interest, , and thenspecifying a Type I error rate and power 1− to be controlled at a given treatment eect size = . Usually, traditional values of and are used (e.g. =0:025; 0:05, =0:05; 0:1; 0:2);∗Correspondence to: B. W. Turnbull, School of Operations Research and Industrial Engineering, Cornell University,Ithaca, NY 14853, U.S.A.†E-mail: [email protected]=grant sponsor: National Institutes of Health; contract=grant number: R01 CA66218Received April 2004Copyright?2005 John Wiley & Sons, Ltd. Accepted March 2005918 C. JENNISON AND B. W. TURNBULLhowever, there can be much debate over the choice of . Some textbooks advocate that should be chosen to represent the minimum ‘clinically relevant’ or ‘commercially viable’ eectsize—see for example References [1, p. 170], [2, p. 149]. Others such as Shun et al. [3] saythat can be taken to be the anticipated eect size—a value based on expectations from priorexperimental, observational and theoretical evidence. Pocock [4] suggests that either approachmight be taken: on pp. 125 and 132, is to be a ‘realistic value’, while in the example onp. 128, it is to be a ‘clinically relevant’ dierence that is ‘important to detect’. In Section 3.5of the ICH Guidance E9 [5], it is also stated that is to be based on a judgement concerningeither the minimal clinically relevant eect size or the ‘anticipated’ eect.The choice of is crucial because, for example, a halving in the chosen eect size willlead to a quadrupling in the sample size for a xed sample test (and in the maximum samplesize for a group sequential test). Using the lower sample size appropriate to a high treatmenteect will leave the trial underpowered to detect a smaller but still important eect. Becauseof this, Shun et al. [3] and others have proposed that the trial be designed using the highereect size (and corresponding lower sample size), but that sample size be re-estimated at aninterim analysis based on the emerging observed treatment dierence. This has been termedthe ‘start small then ask for more’ strategy [6]. Liu and Chi [7] present formal two-stagedesigns in which the rst stage sample size is sucient to provide speci ed power at anexpected eect size but additional observations in the second stage increase power at smallereect sizes and guarantee an overall power requirement at a minimal clinically signi canttreatment eect.There have been several accounts in the literature of studies in which sample size has beenadapted in order to increase power at lower eect sizes. Cui et al. [8] report on a placebocontrolled myocardial infarction prevention trial with a sample size of 600 subjects per treat-ment arm, this number being based on a planned eect size of a 50 per cent reduction inincidence and 95 per cent power. However, midway through the trial, only about a 25 per centreduction in incidence was observed, a reduction which was still of clinical and commercialimportance. Because of the low conditional power at this stage, the sponsor of the trial sub-mitted a proposal to expand the sample size. In recent years, classes of procedures termed‘exible’, ‘adaptive’, ‘self-designing’ or ‘variance spending’ have been developed whichenable such sample size re-estimation to be done while preserving the Type I error rate .See References [8–14] among others.Remarks by some authors, e.g. Shen and Fisher [15] and Shun et al. [3], suggest a desireto set a speci c power, 1−, at whatever is the true value of the eect size parameter. Thisaim may lead to adaptive designs with a power curve rising sharply from at = 0, thenremaining almost at at 1−. In consequence, signi cant risk of a negative outcome remainseven when the eect size is high and power close to one could easily have been attained.All the above discussion supports the view that a clinical trial should guarantee power ateect sizes of clinical or commercial interest. Smaller eects are not pertinent since, asShih [16, p. 517] states ‘::: trials need to consider sample size to detect a dierence that isclinically meaningful, not merely to nd a statistical signi cance.’ Limitations occur when thesample
View Full Document