##
This **preview** shows page *1-2-3*
out of 10 **pages**.

*View Full Document*

End of preview. Want to read all 10 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document**Unformatted text preview:**

ACADEMIA AND CLINICToward Evidence-Based Medical Statistics. 1: The P Value FallacySteven N. Goodman, MD, PhDAn important problem exists in the interpretation of mod-ern medical research data: Biological understanding andprevious research play little formal role in the interpreta-tion of quantitative results. This phenomenon is manifestin the discussion sections of research articles and ulti-mately can affect the reliability of conclusions. The stan-dard statistical approach has created this situation by pro-moting the illusion that conclusions can be produced withcertain “error rates,” without consideration of informa-tion from outside the experiment. This statistical ap-proach, the key components of which are P values andhypothesis tests, is widely perceived as a mathematicallycoherent approach to inference. There is little apprecia-tion in the medical community that the methodology is anamalgam of incompatible elements, whose utility for sci-entific inference has been the subject of intense debateamong statisticians for almost 70 years. This article intro-duces some of the key elements of that debate and tracesthe appeal and adverse impact of this methodology to theP value fallacy, the mistaken idea that a single number cancapture both the long-run outcomes of an experiment andthe evidential meaning of a single result. This argument ismade as a prelude to the suggestion that another measureof evidence should be used—the Bayes factor, which prop-erly separates issues of long-run behavior from evidentialstrength and allows the integration of background knowl-edge with statistical findings.This paper is also available at http://www.acponline.org.Ann Intern Med. 1999;130:995-1004.From Johns Hopkins University School of Medicine, Baltimore,Maryland. For the current author address, see end of text.The past decade has seen the rise of evidence-based medicine, a movement that has focusedattention on the importance of using clinical studiesfor empirical demonstration of the efficacy of med-ical interventions. Increasingly, physicians are beingcalled on to assess such studies to help them makeclinical decisions and understand the rationale be-hind recommended practices. This type of assess-ment requires an understanding of research methodsthat until recently was not expected of physicians.These research methods include statistical tech-niques used to assist in drawing conclusions. How-ever, the methods of statistical inference in currentuse are not “evidence-based” and thus have contrib-uted to a widespread misperception. The mispercep-tion is that absent any consideration of biologicalplausibility and prior evidence, statistical methodscan provide a number that by itself reflects a prob-ability of reaching erroneous conclusions. This be-lief has damaged the quality of scientific reasoningand discourse, primarily by making it difficult tounderstand how the strength of the evidence in aparticular study can be related to and combinedwith the strength of other evidence (from otherlaboratory or clinical studies, scientific reasoning, orclinical experience). This results in many knowledgeclaims that do not stand the test of time (1, 2).A pair of articles in this issue examines this prob-lem in some depth and proposes a partial solution.In this article, I explore the historical and logicalfoundations of the dominant school of medical sta-tistics, sometimes referred to as frequentist statistics,which might be described as error-based. I explicatethe logical fallacy at the heart of this system and thereason that it maintains such a tenacious hold onthe minds of investigators, policymakers, and jour-nal editors. In the second article (3), I present anevidence-based approach derived from Bayesian sta-tistical methods, an alternative perspective that hasbeen one of the most active areas of biostatisticaldevelopment during the past 20 years. Bayesianmethods have started to make inroads into medicalSee related article on pp 1005-1013 andeditorial comment on pp 1019-1021.©1999 American College of Physicians–American Society of Internal Medicine 995journals; Annals, for example, has included a sectionon Bayesian data interpretation in its Informationfor Authors section since 1 July 1997.The perspective on Bayesian methods offeredhere will differ somewhat from that in previous pre-sentations in other medical journals. It will focusnot on the controversial use of these methods inmeasuring “belief” but rather on how they measurethe weight of quantitative evidence. We will see howreporting an index called the Bayes factor (which inits simplest form is also called a likelihood ratio)instead of the P value can facilitate the integrationof statistical summaries and biological knowledgeand lead to a better understanding of the role ofscientific judgment in the interpretation of medicalresearch.An Example of the ProblemA recent randomized, controlled trial of hydro-cortisone treatment for the chronic fatigue syn-drome showed a treatment effect that neared thethreshold for statistical significance, P 5 0.06 (4).The discussion section began, “. . . hydrocortisonetreatment was associated with an improvement insymptoms...This is the first such study...todem-onstrate improvement with a drug treatment of [thechronic fatigue syndrome]” (4).What is remarkable about this paper is how un-remarkable it is. It is typical of many medical re-search reports in that a conclusion based on thefindings is stated at the beginning of the discussion.Later in the discussion, such issues as biologicalmechanism, effect magnitude, and supporting stud-ies are presented. But a conclusion is stated beforethe actual discussion, as though it is derived directlyfrom the results, a mere linguistic transformation ofP 5 0.06. This is a natural consequence of a statis-tical method that has almost eliminated our abilityto distinguish between statistical results and scien-tific conclusions. We will see how this is a naturaloutgrowth of the “P value fallacy.”Philosophical PreliminariesTo begin our exploration of the P value fallacy,we must consider the basic elements of reasoning.The process that we use to link underlying knowl-edge to the observed world is called inferential rea-soning, of which there are two logical types: deduc-tive inference and inductive inference. In deductiveinference, we start with a given hypothesis (a state-ment about how nature works) and predict what weshould see if that hypothesis were true.

View Full Document