Unformatted text preview:

Chapter 6Fragments6.0.1 The Normal distribution for data: do textbooks givea misleading impression of its ubiquity?Almost all textbooks on introductory statistics have a chapter on the Normaldistribution, mainly motivated as a prelude to the mathematical theory aroundprobability distributions for sample averages. The latter, in proper context, isuncontroversial. Let’s consider the setting where we have observational data, ofthe type1that can be represented in a histogram.Question. For what kinds of data is it empirically true, or reasonable tosuppose, that histograms are approximately Normal?Giving a serious answer to this, intrinsically rather vague, question is surpris-ingly hard. What do textbooks say? I read the relevant chapter in three text-books: [48] Chapter 6; [45] Chapter 5; [16] Chapter 9. What they explicitly sayin the chapter is brief and unobjectionable. In fact [16] says nothing; [48] saysThe normal distribution is used as a model for a variety of physicalmeasurements since it has been discovered that many such measure-ments have distributions that are normally distributed or at leastapproximately soand [45] adroitly sidesteps the issue by quoting a folklore phraseThe “fuzzy” central limit theorem says that data which are influencedby many small and unrelated random effects are approximately nor-mally distributed.Having said this (and no more) about the general question, each proceedsto in-text examples and student exercises, many of which involve data havinga Normal distribution. The subjects of the examples and exercises in those1Briefly: the same attribute of different “individuals”, interpreted broadly77three chapters are listed (essentially completely) in Table xxx. My suppositionis that, to a student, the memory of the one explicit sentence is crowded outby many examples and exercises; so the student comes away, consciously orunconsciously, with the idea that data on these kind of subjects follows theNormal distribution.Notes on Table xxx. I have categorized these textbook examples as follows,even though the author’s intention is often unclear.• cited: the author appears to have some specific data set in mind, thoughtypically cited as e.g. “the National Health Survey” rather than moreprecisely.• asserted: the author appears to be saying that, as a known general fact,data of this type is approximately normally distributed.• assumed: the author has either completely hypothetical data, or real datawith given mean and s.d. but no given distribution, and tells the readerto assume normal distribution for the purpose of doing a calculation.I excluded a few examples: quantities standardized to Normal (IQ scores); clas-sical data (Qu´etelet); numerical data intended for student computer-aided anal-ysis. Much of the human data is “broken down by age and sex”, in the classicphrase.78cited asserted assumed example type√ √height human physiology√weight√cholesterol level√blood pressure√gestation time√body temperature√brain weight√skull breadth√eye-contact time√reduction in oxygen consumption. . . during transcendental meditation√SAT (and similar exam) scores human behavior√farm laborer wages√family food expenses√recreational shopping expenses (teenage)√TV hours watched (child)√time in shower√10k race times√baseball batting ave√household paper recycling quantity√in-home furniture assembly time√military service point scores√store complaints per day√horse gestation time non-human physiology√rattlesnake length√scorpion length√grapefruit weight√TV (and similar) lifetimes product quality√electrical resistance of product√auto tire life (miles)√weight in package√thermometer inaccuracy√coil springs strength√weight of quarters (25c)√yearly major earthquakes geophysics√annual rainfall Iowa√inter-eruption times, Old Faithful√inflight radiation exposure miscellaneous√mice: number of fights√repeated weighingsTable xxx. Textbook [16, 45, 48] examples of the Normal distribution fordata.796.0.2 The Normal distribution for data: is it empiricallytrue?Everyone believes in the [normal] law of errors: the mathematicians,because they think it is an experimental fact; and the experimenters,because they suppose it is a theorem of mathematics. Oft-quotedremark, attributed by Poincar´e to Gabriel Lippmann.xxx I don’t have anything new to say6.0.3 xxx Normal - our dataauto.1 is data on prices (scaled to average = 100) quoted for repairs by 421 BayArea auto repair shops. xxx Do you expect the histogram to be approximatelynormal?70 80 90 100 110 120 130Table xxx. Prices charged by 421 auto repair shops. Data


View Full Document

Berkeley STAT 157 - Fragments

Download Fragments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Fragments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fragments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?