ACADEMIA AND CLINIC Toward Evidence Based Medical Statistics 1 The P Value Fallacy Steven N Goodman MD PhD An important problem exists in the interpretation of modern medical research data Biological understanding and previous research play little formal role in the interpretation of quantitative results This phenomenon is manifest in the discussion sections of research articles and ultimately can affect the reliability of conclusions The standard statistical approach has created this situation by promoting the illusion that conclusions can be produced with certain error rates without consideration of information from outside the experiment This statistical approach the key components of which are P values and hypothesis tests is widely perceived as a mathematically coherent approach to inference There is little appreciation in the medical community that the methodology is an amalgam of incompatible elements whose utility for scientific inference has been the subject of intense debate among statisticians for almost 70 years This article introduces some of the key elements of that debate and traces the appeal and adverse impact of this methodology to the P value fallacy the mistaken idea that a single number can capture both the long run outcomes of an experiment and the evidential meaning of a single result This argument is made as a prelude to the suggestion that another measure of evidence should be used the Bayes factor which properly separates issues of long run behavior from evidential strength and allows the integration of background knowledge with statistical findings T he past decade has seen the rise of evidencebased medicine a movement that has focused attention on the importance of using clinical studies for empirical demonstration of the efficacy of medical interventions Increasingly physicians are being called on to assess such studies to help them make clinical decisions and understand the rationale behind recommended practices This type of assessment requires an understanding of research methods that until recently was not expected of physicians These research methods include statistical techniques used to assist in drawing conclusions However the methods of statistical inference in current use are not evidence based and thus have contributed to a widespread misperception The misperception is that absent any consideration of biological plausibility and prior evidence statistical methods can provide a number that by itself reflects a probability of reaching erroneous conclusions This belief has damaged the quality of scientific reasoning and discourse primarily by making it difficult to understand how the strength of the evidence in a particular study can be related to and combined with the strength of other evidence from other laboratory or clinical studies scientific reasoning or clinical experience This results in many knowledge claims that do not stand the test of time 1 2 A pair of articles in this issue examines this problem in some depth and proposes a partial solution In this article I explore the historical and logical foundations of the dominant school of medical statistics sometimes referred to as frequentist statistics which might be described as error based I explicate the logical fallacy at the heart of this system and the reason that it maintains such a tenacious hold on the minds of investigators policymakers and journal editors In the second article 3 I present an evidence based approach derived from Bayesian statistical methods an alternative perspective that has been one of the most active areas of biostatistical development during the past 20 years Bayesian methods have started to make inroads into medical This paper is also available at http www acponline org Ann Intern Med 1999 130 995 1004 From Johns Hopkins University School of Medicine Baltimore Maryland For the current author address see end of text See related article on pp 1005 1013 and editorial comment on pp 1019 1021 1999 American College of Physicians American Society of Internal Medicine 995 Later in the discussion such issues as biological mechanism effect magnitude and supporting studies are presented But a conclusion is stated before the actual discussion as though it is derived directly from the results a mere linguistic transformation of P 5 0 06 This is a natural consequence of a statistical method that has almost eliminated our ability to distinguish between statistical results and scientific conclusions We will see how this is a natural outgrowth of the P value fallacy Philosophical Preliminaries Figure 1 The parallels between the processes of induction and deduction in medical inference top and statistical inference bottom D 5 treatment difference journals Annals for example has included a section on Bayesian data interpretation in its Information for Authors section since 1 July 1997 The perspective on Bayesian methods offered here will differ somewhat from that in previous presentations in other medical journals It will focus not on the controversial use of these methods in measuring belief but rather on how they measure the weight of quantitative evidence We will see how reporting an index called the Bayes factor which in its simplest form is also called a likelihood ratio instead of the P value can facilitate the integration of statistical summaries and biological knowledge and lead to a better understanding of the role of scientific judgment in the interpretation of medical research An Example of the Problem A recent randomized controlled trial of hydrocortisone treatment for the chronic fatigue syndrome showed a treatment effect that neared the threshold for statistical significance P 5 0 06 4 The discussion section began hydrocortisone treatment was associated with an improvement in symptoms This is the first such study to demonstrate improvement with a drug treatment of the chronic fatigue syndrome 4 What is remarkable about this paper is how unremarkable it is It is typical of many medical research reports in that a conclusion based on the findings is stated at the beginning of the discussion 996 15 June 1999 Annals of Internal Medicine To begin our exploration of the P value fallacy we must consider the basic elements of reasoning The process that we use to link underlying knowledge to the observed world is called inferential reasoning of which there are two logical types deductive inference and inductive inference In deductive inference we start with a given
View Full Document
Unlocking...