Unformatted text preview:

6A: Measures of VariationThe standard deviation of a list of n numbersequalsthe square root of6A: Measures of VariationThe range of a data set is the difference between its highest and lowest data values:range = highest value (max) minuslowest value (min)Example: The range of the data set 1, 2, 8, 3, 1is 8 (max) minus 1 (min), or 7.The range of a data set is very sensitive to the presence of outliers:1,2,2,2,3: range = 3 – 1 = 21,2,2,2,93: range = 93 – 1 = 92.The “five numbers” summary of a distribution:1, 4, 6, 7, 10, 11, 13, 16, 18, 19, 20The median is 11.Lower half (values below the median):1, 4, 6, 7, 10Upper half (values above the median):13, 16, 18, 19, 201. Lowest value = 12. 1st quartile (median of lower half) = 63. 2nd quartile (median of list) = 114. 3rd quartile (median of upper half) = 185. Highest value = 20For this example, the “box-and-whisker” plot is ----------------------------- | | | |------------------------------------------|| | | -----------------------------0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21The 1st quartile is also called the lower quartile, and the 3rd quartile is also called theupper quartile.In a large data set,25% of the numbers lie between the lowest value and the 1st quartile,25% of the numbers lie between the 1st quartile and the 2nd quartile,25% of the numbers lie betweenthe 2nd quartile and the 3rd quartile, and25% of the numbers lie betweenthe 3rd quartile and the highest value.Likewise:10% of the numbers lie betweenthe lowest value and the “1st decile”,10% of the numbers lie betweenthe 1st decile and the 2nd decile,etc.And:1% of the numbers lie betweenthe lowest value and the 1st percentile,etc.What fraction of Americans have annual incomes that fall between the 40th percentile and the 60th percentile?Answer: 20%, or one-fifth.Standard deviation = a common way of measuring the amount of variation in a distribution (abbreviation: “st. dev.” or “s.d.”)Small s.d.  distribution is tightly clustered about its mean valueLarge s.d.  distribution is broadly dispersed about its mean valueThe standard deviation of a list of n numbersequalsthe square root of((the sum of the squares of the deviations)divided by (n–1)).Example: 1,3,4,5,7.Mean: 4.Deviations:1–4 = –3,3–4 = –1,4–4 = 0,5–4 = +1, and7–4 = +3.Squares of deviations: 9, 1, 0, 1, 9.S.d. = sqrt((9+1+0+1+9)/4) = sqrt(5)or about 2.24.Note that in this example, over half of the values lie within 2.24 (“one standard deviation”) of the mean, and all of the valueslie within 4.48 (“two standard deviations”) of the mean.The “range rule of thumb” says that the low value of a distribution is about two standard deviations below the mean, and the high value is about two standard deviations abovethe mean.This is not a good rule when the data set is extremely large, or when there are outliers, or when the distribution is uneven.Note that the standard deviation of a list of measured values is measured in the same units as the values themselves.Example: Data set: 4 ft, 5 ft, 6 ftMean: 5 ftDeviations: –1 ft, 0 ft, +1 ftSquared deviations: 1 sq ft, 0 sq ft, 1 sq ftSt. dev. = sqrt(((1+0+1) sq ft)/(3-1)) = sqrt(1 sq ft)= 1 ftWhy do we use standard deviation as a measure of variation, instead of other measures?One answer: these other measures don’t obey anything like Chebyshev’s theorem.Two special cases of Chebyshev’s theorem:1. For any data set, at least 75% of all data values lie within 2 standard deviations of themean.2. For any data set, at least 89% of all data values lie within 3 standard deviations of themean.For many applications, Chebyshev’s theoremis unduly conservative: more commonly, over 99% of all data values lie within 3 standard deviations of the mean.N.B. If you study statistics in greater depth, you’ll find that there are two formulas for standard deviation: in one of them (“sample standard deviation”) you divide by n–1, as above, and in the other (“population standarddeviation”) you divide by n. For this class, we’ll only use sample standard deviation.Does It Make Sense?7. “Both exams had the same range, so they must have had the same median.”8. “The highest exam score was in the upper quartile of the distribution.”9. “For the 30 students who took the test, thehigh score was 80, the median was 74, and the low score was 40.”10. “I examined the data carefully, and the range was greater than the standard deviation.”11. “The standard deviation for the heights of a group of 5-year-old children is smaller than the standard deviation for the heights ofa group of children who range in age from 3 to 15.”12. “The mean gas mileage of the compact cars we tested was 34 miles per gallon, with a standard deviation of 5


View Full Document

UW-Madison MATH 141 - Measures of Variation

Download Measures of Variation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Measures of Variation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Measures of Variation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?