1 2 Bar charts for nominal and ordinal data 22S 105 Statistical Methods and Computing present a frequency distribution in visual form categories that are possible values of the variable are listed on horizontal axis Graphical Depiction of Qualitative and Quantitative Data and Measures of Central Tendency bar heights represent either frequency or relative frequency of observations in that class Lecture 2 January 21 2011 Kate Cowles 374 SH 335 0727 kate cowles uiowa edu 3 Continuing example of cereal data 4 Pie charts a slice for each possible value of the variable FREQUENCY 30 area of slice represents the proportion of the whole that the category makes up all categories must be included 20 10 0 A m e r i c a n H o m e G e n e r a l K e l l o g g s N a b i s c o P o s t Q u a k e r O a t s M i l l s Manufacturer R a l s t o n P u r i n a 5 6 Histograms for quantitative data FREQUENCY of mfr presents a frequency distribution of discrete or continuous data in visual form range of possible values must be divided into intervals easiest to work with if intervals are of equal width limits of intervals are shown on horizontal axis General Mills 22 vertical bar centered at midpoint of each interval Kelloggs 23 area of each bar represents frequency associated with corresponding interval American Home 1 Ralston Purina 8 Nabisco 6 Quaker Oats 8 Post 9 7 Example 1 Histogram of body temperatures of 130 people 8 Example 2 Wealth in billions of dollars of the 209 billionaires in the world in 1992 9 Symmetric and skewed distributions 10 Stemplots for quantitative variables symmetric right and left sides of histogram are roughly mirror images show overall shape of distribtution skewed to the right long tail on right side some extremely large values feasible only for fairly small datasets give more detailed information than histograms skewed to the left some extremely small values Stem 100 100 99 99 98 98 97 97 96 96 Outliers individual values that deviate from the general pattern of the data 11 Example Investigators suspected that Benzo a pyrene or BaP from a pipe foundry in Phillipsburg NJ might be contaminating household air This dataset presents data from 14 different days on samples of indoor air from a house near the foundry and samples of outdoor air collected at the same times The measures are concentrations of BaP containing particles no larger than 10 micrograms The two variables are indoor air BaP outdoor air BaP Reference Lioy PL Walman JM Greenberg A Harkov R and Pietarninen C 1988 The total human environmental exposure study THEES to Benzo a pyrene Comparison of the inhalation and food pathways Archives of Environmental Health 43 304 312 Leaf 8 0 59 000001112223344 555666666666677777777888888888899 00000000000111222222222233333444444444 556666777888888899999 0111222344444 7789 34 12 Variable OUTDOOR Stem 7 6 5 4 3 2 Leaf 8 57 06 01 58 04557 Multiply Stem Leaf by 10 1 1 2 2 2 2 5 15 33 38 21 13 13 Line plots or time plots 14 Measures of central tendency for quantitative data Usually time is plotted on the x axis Before we can use data to draw conclusions we must summarize the data to get the overall picture Some other variable that changes over time is plotted on y axis Points are connected by lines Number of values may be so large that looking at them all at once loses meaning We may be interested in too many different variables to graph each one Note We often refer to the data we have collected as a sample because it probably does not include all the possible subjects of the type in which we are interested Example High water mark for Amazon River at Isquitos Peru for years 1962 1978 One useful measure is to define the center or middle of the data Several different measures of central tendency are useful in different situations 15 Example a sample of birthweights of live born infants born at a private hospital in San Diego during a 1 week period in grams 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 3265 3260 3245 3484 4146 3323 3649 3200 3031 2069 2581 2841 3609 2838 3541 2759 3248 3314 3101 2834 16 The mean The arithmetic mean or average of a set of values is calculated by adding up all the values and dividing by the number of values If we add up all the birthweights and divide by 20 we find that the mean is 3166 9 g 17 We can write the computation of the mean as Notation Generically we may refer to each value of a particular numeric variable in a dataset as xi where i indexes observations In the birthweights data x1 3265 x15 3541 So all the values for this variable may be referred to as x1 xn where n is the total number of observations in the dataset We can use the summation sign to indicate a sum The following notation ni 1xi is a short way of writing x1 x2 xn 19 The mean is very sensitive to extreme values in the sample Example the mean of the following numbers is 84 75 82 95 80 88 But the mean of the following numbers is 74 25 18 82 95 80 88 1 x ni 1xi n x is the standard notation for the mean if we are referring to the individual data values as xis
View Full Document