New version page

tables4

This preview shows page 1-2 out of 6 pages.

View Full Document
View Full Document

End of preview. Want to read all 6 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Why Tables are Really Much Better than Graphs1 Andrew Gelman2 28 October 2009 Abstract. The statistical community is divided when it comes to graphical methods and models. Graphics researchers tend to disparage models and to focus on direct representations of data, mediated perhaps by research on perceptions but certainly not by probability distributions. From the other side, modelers tend to think of graphics as a cute toy for exploring raw data but not much help when it comes to the serious business of modeling. In order to better understand the benefits and limitations of graphs in statistical analysis, this article presents a series of criticisms of graphical methods in the voice of a hypothetical old-school analytical statistician or social scientist. We hope to elicit elaborations and extensions of these and other arguments on the limitations of graphics, along with responses from graphical researchers who might have different perceptions of these issues. 1 To appear (with discussion) in Journal of Computational and Graphical Statistics. We thank the Institute of Education Sciences for partial support of this work. 2 Department of Statistics and Department of Political Science, Columbia University, [email protected], http://www.stat.columbia.edu/~gelman/The benefits and limitations of statistical graphics My purpose in writing this article is to elicit lively discussion of the uses of graphical methods in statistical analysis. Graphs tend to be ignored or underused in much of the literature of statistics and applied fields (see, for example, Gelman, Dodhia, and Pasarica, 2002, and Kastellec and Leoni, 2008), and the literature on graphical methods is small and is mostly separate from the rest of statistics. The related field of data visualization has become increasingly prominent in digital communication and the arts, but there the focus is typically on eye-catching design rather than on conveying statistical information.3 Here I would like to stimulate considerations of the connections between graphics and more formal statistical analysis, along with a serious discussion of the drawbacks of visual presentation of quantitative information: if graphs really are so wonderful and underused in applied statistics (as I in fact believe), what is holding people back from integrating them much more into data analysis? Following the revolution begun by Tukey (1970, 1977) and continued by Chambers et al. (1983), Cleveland (1985), and Tufte (1983, 1990), graphical methods for exploratory data analysis have generally been recognized to be a useful first step in any statistical study. Beyond this, though, there is disagreement, with the dominant strain of applied researchers (at least in social science) feeling that, when the serious models come out, it’s time to put the graphical toys away. From the other direction, researchers in statistical graphics often disparage models and to focus on direct representations of data. There is a lot of valuable work combining analytical modeling and graphical display (for a classical example, consider Daniel, 1959), but in much of the published work in political science, economics, sociology, and other areas, graphics have little if any serious role, being used to display some simple data summaries and never seen again, with important findings displayed in tabular form. Those of us who believe graphing to be important and even essential to research would be well advised to think hard about why visual displays are not used more extensively in serious applied research. To this end, this article presents a series of attacks on graphical methods in the voice of a hypothetical old-school analytical statistician or social scientist. Although this originated as an April Fool’s blog entry (Gelman, 2009a), I believe these are strong arguments to be taken seriously—and ultimately accepted in some settings and refuted in others. I welcome elaboration and discussion of these points by statisticians and statistically-minded researchers in applied fields. I have my own answers to some of these objections but do not present them here, in the interest of presenting an open forum for discussion. 3 For example, when the influential (and interesting) Flowing Data blog published a list of the “5 best data visualization projects of the year” (Yau, 2008), a debate ensued over whether data visualizations should aim for transparent communication of information (Gelman, 2009b,c) or for visual novelty and beauty (Yau, 2009).The arguments I lay out are, briefly, that graphs are a distraction from more serious analysis; that graphs can mislead in displaying compelling patterns that are not statistically significant and that could easily enough be consistent with chance variation; that diagnostic plots could be useful in the development of a model but don’t belong in final reports; that, when they take the place of tables, graphs place the careful reader one step further away from the numerical inferences that are the essence of rigorous scientific inquiry; and that the effort spent making flashy graphics would be better spent on the substance of the problem being studied. Some problems with graphs Graphs are gimmicks, substituting fancy displays for careful analysis and rigorous reasoning. It's basically a tradeoff: the snazzier your display, the more you can get away with a crappy underlying analysis. Conversely, a good analysis doesn't need a fancy graph to sell itself. The best quantitative research has an underlying clarity and a substantive importance whose results are best presented in a sober, serious tabular display. And the best quantitative researchers trust their peers enough to present their estimates and standard errors directly, with no tricks, for all to see and evaluate. Let's leave the dot plots, pie charts, moving zip charts, and all the rest to the folks in the marketing department and the art directors of Newsweek and USA Today. As scientists we're doing actual research and we want to see, and present, the hard numbers. To get a sense of what's at stake here, consider two sorts of analyses. At one extreme are controlled experiments with clean estimate and p-value, and well-specified regressions with robust standard errors, where the p-values really mean something. At the other extreme are descriptive data summaries—often augmented with models such as multilevel regressions chock full of probability distributions


Loading Unlocking...
Login

Join to view tables4 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view tables4 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?