UVM STAT 380 - Research Paper - D59228

Home> Schools> The University of Vermont> (STAT) > STAT 380> Research Paper

UVM STAT 380 - Research Paper

Course Stat 380- Sem:Statistics & Biostatistics

Pages 8

Download Save

Unformatted text preview:

JOURNAL OF COUNSELING & DEVELOPMENT • WINTER 2002 • VOLUME 8064RESEARCHStatistical significance tests have a long history datingback at least to the 1700s. In 1710 a Scottish physi-cian, John Arbuthnot, published his statistical analy-sis of 82 years of London birth rates as regards gender(Hacking, 1965). Similar applications emergedsporadically over the course of the next two centuries.But statistical testing did not become ubiquitous until the early1900s. In 1900, Karl Pearson developed the chi-square goodness-of-fit test. In 1908, William S. Gossett published his t test underthe pseudonym “Student” because of the employment restric-tions of the Dublin-based Guinness brewery in which he worked.In 1918, Ronald Fisher first articulated the analysis of vari-ance (ANOVA) logic. Snedecor (1934) subsequently proposedan ANOVA test statistic, that he named “F” in honor of Fisher,who of course subsequently became “Sir” Ronald Fisher. But itwas with the 1925 first publication of Fisher’s book StatisticalMethods for Research Workers and the 1935 publication of hisbook The Design of Experiments that statistical testing was re-ally popularized.Huberty (1993; Huberty & Pike, 1999) provided authorita-tive details on this history. However, it is noteworthy thatcriticisms of statistical testing are virtually as old as the methoditself (cf. Berkson, 1938). For example, in his critique of themindless use of statistical tests titled “Mathematical vs. Scien-tific Significance,” Boring (1919) argued some 80 years ago,The case is one of many where statistical ability, divorced from ascientific intimacy with the fundamental observations, leads nowhere.(p. 338)Statistical tests have been subjected to both intermittent(e.g., Carver, 1978; Meehl, 1978) and contemporary criti-cisms (cf. Cohen, 1994; Schmidt, 1996). For example, Tryon(1998) recently lamented,[T]he fact that statistical experts and investigators publishing in thebest journals cannot consistently interpret the results of these analysesis extremely disturbing. Seventy-two years of education have re-sulted in minuscule, if any, progress toward correcting this situation. Itis difficult to estimate the handicap that widespread, incorrect, andintractable use of a primary data analytic method has on a scientificdiscipline, but the deleterious effects are doubtless substantial. (p. 796)Anderson, Burnham, and Thompson (2000) provided a chartsummarizing the frequencies of publications of such criti-cisms across both decades and diverse disciplines.Such criticism has stimulated defenders to articulate viewsthat are also thoughtful. Noteworthy examples includeAbelson (1997), Cortina and Dunlap (1997), and Frick(1996). The most balanced and comprehensive treatment ofdiverse perspectives is provided by Harlow, Mulaik, andSteiger (1997; for reviews of this book, see Levin, 1998;Thompson, 1998).PURPOSE OF THE PRESENT ARTICLEThe purpose of the present review is not to argue whetherstatistical significance tests should be banned (cf. Schmidt &Hunter, 1997) or not banned (cf. Abelson, 1997). These variousviews have been repeatedly presented in the literature.Bruce Thompson is a professor and a distinguished research scholar in the Department of Educational Psychology at Texas A&M University, College Station;an adjunct professor of family and community medicine at Baylor College of Medicine, Houston, Texas; and a Visiting Distinguished Fellow at the UniversityInstitute for Advanced Study at La Trobe University in Melbourne, Australia. Correspondence regarding this article should be sent to Bruce Thompson,TAMU Department of Educational Psychology, College Station, TX 77843-4225 (Web URL: http://www.coe.tamu.edu/~bthompson).“Statistical,” “Practical,” and “Clinical”: How Many Kindsof Significance Do Counselors Need to Consider?Bruce ThompsonThe present article reviews and distinguishes 3 related but different types of significance: “statistical,” “practical,” and “clinical.” Aframework for conceptualizing the many “practical” effect size indices is described. Several effect size indices that counselingresearchers can use, or that counselors reading the literature may encounter, are summarized. A way of estimating “corrected”intervention effects is proposed. It is suggested that readers should expect authors to report indices of “practical” or “clinical”significance, or both, within their research reports; and it is noted that indeed some journals now require such reports.JOURNAL OF COUNSELING & DEVELOPMENT • WINTER 2002 • VOLUME 8065Kinds of Significance Counselors Need to ConsiderInstead, this article has three purposes. First, the articleseeks to clarify the distinction between three “kinds” of sig-nificance: “statistical,” “practical,” and “clinical.” Second, vari-ous indices of practical and clinical significance are brieflyreviewed. Finally, it is argued that counselors should notconsider only statistical significance when conducting in-quiries or evaluating research reports.Practical or clinical significance, or both, will usually berelevant in most counseling research projects and should beexplicitly and directly addressed. Authors should alwaysreport one or more of the indices of “practical” or “clinical”significance, or both. Readers should expect them. And it isargued in this article that editors should require them.THREE KINDS OF SIGNIFICANCE“Statistical” SignificanceWhat “statistical” significance tests do. Statistical significanceestimates the probability (pCALCULATED) of sample resultsdeviating as much or more than do the actual sample resultsfrom those specified by the null hypothesis for the popula-tion, given the sample size (Cohen, 1994). In other words,these tests do not evaluate the probability that sample re-sults describe the population; if these statistical tests didthat, they would bear on whether the sample results arereplicable. Instead, the tests assume that the null exactlydescribes the population and then test the sample’s prob-ability (Thompson, 1996).Of course, this logic is a bit convoluted and does not tellus what we want to know regarding population values andthe likelihood of result replication for future samples drawnfrom the same population. Thus Cohen (1994) concludedthat the statistical significance test “does not tell us whatwe want to know, and we so much want to know what wewant to know that,

View Full Document


School:
Email:
New Password:
Confirm Password:

UVM STAT 380 - Research Paper

Sign up for free to view:

Please select your school