1 2 The mean is meaningful only for quantitative data either discrete or continuous 22S 30 105 Statistical Methods and Computing Example regarding a discrete variable We hear reports such as that the average number of children per family is 1 9 Measures of Center continued Measures of Dispersion The mean is not meaningful for nominal or ordinal data Lecture 3 January 23 2006 Exception if a binary variable is coded as 0 and 1 Kate Cowles 374 SH 335 0727 kcowles stat uiowa edu Then the arithmetic mean is the proportion of observations in the dataset that have value 1 3 Example An ecological study of a habitat in which 10 rare species of bird are known to have lived as of 1990 In 1999 a naturalist is sent to spend a day in the area and to record any members of these 10 species that she observes A variable is coded as follows 1 at least one member of the species was observed 0 no members of the species were observed 4 species 1 2 3 4 5 6 7 8 9 10 The mean observed 1 0 1 1 1 0 1 1 1 1 8 8 10 indicates that 80 of the species were observed x 5 The median The median is the 50th percentile of a set of observations 6 The median is not strongly affected by a few extreme values in the dataset Example 1 Values must be sorted from smallest to largest 75 80 82 88 25 82 88 95 95 82 88 95 Example 2 95 If the number of observations is even then the usual way to define the median is as the mean of the two middle values 80 88 median 82 The median is 82 75 82 mean 84 If the number of observations is odd then the median is the middle value 75 80 97 80 mean 74 median 82 The median is robust to extreme values The median is 82 88 2 85 7 The median can be used as a measure of center for ordinal data as well as for discrete and continuous data Example The NYC poll Cumulative city1yr Frequency Percent Frequency Worse 593 61 64 593 Same 252 26 20 845 Better 111 11 54 956 956 people answered this question regarding whether they thought the condition of the city in June 2003 was better worse or the same as one year earlier If the values are sorted from smallest to largest Worse Same Better then the median will be the average of the 478th and 479th values We can use the cumulative frequencies in the table to figure out what these have to be They are both in category Worse Thus the median is Worse 8 The mode The mode of a set of values is the value that occurs most frequently Example in the NYC poll data the mode of the city1yr variable is Worse Example There is no mode in the birthweights data because no value occurs more than once There may be more than one mode in a set of values The mode may be reported for all types of data 9 When is each measure of central tendency appropriate 10 Depending on the shape of the distribution of values quantitative variables if the shape is approximately symmetric and has only one mode Depending on data type Nominal data mean and median will be close in value mean is typically reported mode only possible exception binary data coded 0 and 1 Example the body temperature data Ordinal data mode or median Quantitative data mean median or mode From a statistical computer package mean 98 24 median 98 3 11 if the distribution is highly skewed if skewed to the right mean will be larger than median if skewed to the left mean will be smaller than median mean may not be a typical value Example the billionaire data From a statistical computer package mean 2 7 billion median 1 8 billion 12 if the distribution has more than one mode neither the mean nor the median may be representative values may be best to report all modes and or to display a graph may occur if two or more different subgroups are represented in the sample 13 14 In getting the overall picture of quantitative data the spread is just as important as the center of the data Example From a statistical computer package mean 69 0 median 72 0 15 16 80 60 40 20 Female doctors 100 the range 0 5 0 0 1 0 1 5 2 0 2 5 3 0 60 40 20 Male doctors 80 100 0 5 0 0 1 0 1 5 2 0 2 5 3 0 Number of Caesarian Sections Performed in a Single Year by Swiss Doctors Numerical measures of dispersion the interquartile range the standard deviation 17 The range The range is the difference between the largest and the smallest observations 18 The range shows the full spread of the data but may be exaggerated if the largest and or smallest values are atypical outliers Example the 1992 billionaire data For the male Swiss doctors With Bill Gates range 37 1 36 billion If Bill were deleted range 24 1 23 billion largest value 86 smallest value 20 range 86 20 66 For the female Swiss doctors Example the male Swiss doctors data largest value 33 smallest value 5 range 33 5 28 With the largest two values range 86 20 66 billion If the two largest values were deleted range 59 20 39 billion So additional measures are needed to give a more complete picture of the spread of values 19 The quartiles and the interquartile range The first quartile is the same as the 25th percentile one quarter of the observations in a dataset have values less than or equal to the 1st quartile and the other three quarters have values greater than or equal to the first quartile The third quartile is the same as the 75th percentile three quarters of the observations in a dataset have values less than or equal to the 3rd quartile and the other one quarter have values greater than or equal to the 3rd quartile 20 The interquartile range IQR is the difference between the 3rd and 1st quartiles For the male Swiss doctors third quartile 50 first quartile 27 IQR 50 27 23 For the female Swiss doctors third quartile 29 first quartile 14 IQR 29 14 15 For the 1992 billionaires third quartile 3 billion first quartile 1 3 billion IQR 3 1 3 1 7 billion 21 The IQR is considered less sensitive to outliers than the range Example the 1992 billionaire data With Bill Gates IQR 3 1 3 1 7 billion If Bill were deleted IQR 2 9 1 3 1 6 billion 22 The five number summary The five number summary provides a reasonablycomplete numeric summary of the center and dispersion of a set of values The five number summary consists of the the the the the However in a small dataset deletion of a few outliers may affect the IQR substantially minimum …
View Full Document