Chapter 1 Overview and Descriptive Statistics 1 1 Populations Samples and Processes Populations and Samples A population is a well defined collection of objects When information is available for the entire population we have a census A subset of the population is a sample Data and Observations Univariate data consists of observations on a single variable multivariate more than two variables Branches of Statistics Descriptive Statistics summary and description of collected data Inferential Statistics generalizing from a sample to a population Relationship Between Probability and Inferential Statistics Probability Population Sample Inferential Statistics 1 2 Pictorial and Tabular Methods in Descriptive Statistics Stem and Leaf Displays 1 Select one or more leading digits for the stem values The trailing digits become the leaves 2 List stem values in a vertical column 3 Record the leaf for every observation 4 Indicate the units for the stem and leaf on the display Stem and Leaf Example Observed values 9 10 15 22 9 15 16 24 11 0 99 1 10556 2 24 Stem tens digit Leaf units digit Stem and Leaf Displays Identify typical value Extent of spread about a value Presence of gaps Extent of symmetry Number and location of peaks Presence of outlying values Another stem and leaf example The decimal point is 1 digit to the right of the 2 55 3 05888 4 03558888888 5 00000000333335555577777 6 00033333555588 7 0003333335555588 8 00000335588 9 088 Dotplots Represent data with dots Observed values 9 10 15 22 9 15 16 24 11 5 10 15 20 25 Types of Variables A variable is discrete if its set of possible values constitutes a finite set or an infinite sequence A variable is continuous if its set of possible values consists of an entire interval on a number line Histograms Discrete Data Determine the frequency and relative frequency for each value of x Then mark possible x values on a horizontal scale Above each value draw a rectangle whose height is the relative frequency of that value Ex Students from a small college were asked how many charge cards they carry x is the variable representing the number of cards and the results are below x people Rel Freq 0 12 0 08 1 42 0 28 2 57 0 38 3 24 0 16 4 9 0 06 5 4 0 03 6 2 0 01 Frequency Distribution Histograms Credit card results x Rel Freq 0 0 08 1 0 28 2 0 38 3 0 16 4 0 06 5 0 03 6 0 01 xi Histograms Continuous Data Equal Class Widths Determine the frequency and relative frequency for each class Then mark the class boundaries on a horizontal measurement axis Above each class interval draw a rectangle whose height is the relative frequency Histogram example 10 5 0 Frequency 15 Histogram of e1scores 20 40 60 e1scores 80 100 Histograms Continuous Data Unequal Widths After determining frequencies and relative frequencies calculate the height of each rectangle using relative frequency of the class rectangle height class width The resulting heights are called densities and the vertical scale is the density scale Histogram Shapes symmetric unimodal positively skewed bimodal negatively skewed Histogram example symmetric slightly bimodal 10 5 0 Frequency 15 Histogram of e1scores 20 40 60 e1scores 80 100 1 3 Measures of Location The Mean The average mean of the n numbers x1 x2 xn is x where n xi x1 x2 xn x i 1 n n Population mean m Median The sample median x is the middle value in a set of data that is arranged in ascending order For an even number of data points the median is the average of the middle two Population median m Median example In a class of 85 exam scores the median x is the 43rd number if the scores are listed in ascending order Note In this case there are 42 above the median and 42 below the median 40 41 42 43 44 45 46 57 5 57 5 60 0 60 0 60 0 62 5 62 5 Three Different Shapes for a Population Distribution m m symmetric mm negative skew mm positive skew Slight positive skew 10 5 0 Frequency 15 Histogram of e1scores 20 40 60 80 Median 60 0 Mean 61 4 100 1 4 Measures of Variability Sample Variance Variance is a measure of the spread of the data The sample variance of the sample x1 x2 xn of n values of X is given by x x i 2 s n 1 2 S xx n 1 We refer to s2 as being based on n 1 degrees of freedom Sample variance example First find sample mean x 61 35 Next add up squared deviations from mean 62 5 61 35 2 90 0 61 35 2 L 21 531 9 Divide by n 1 where n is the number of observations in this case 85 21 531 9 256 3 84 Standard Deviation Standard deviation is a measure of the spread of the data using the same units as the data The sample standard deviation is the square root of the sample variance s s 2 Standard deviation example 2 s s 256 3 16 0 Formula for s2 An alternative expression for the numerator of s2 is S xx xi x 2 xi2 xi n 2 Formula for s2 Shortcut example First sum the scores Next sum the squares Numerator of variance equals n x 5215 i i 1 n x 2 i 341 487 5 i 1 2 5215 341 487 5 21 531 9 85 Properties of s2 Let x1 x2 xn be any sample and c be any nonzero constant 2 2 1 If y1 x1 c yn xn c then s y s x 2 2 2 2 If y1 cx1 yn cxn then s y c s x 2 where s x is the sample variance of the x s 2 and s y is the sample variance of the y s Upper and Lower Fourths After the n observations in a data set are ordered from smallest to largest the lower upper fourth is the median of the smallest largest half of the data where the median x is included in both halves if n is odd A measure of the spread that is resistant to outliers is the fourth spread fs upper fourth lower fourth Third and first quartiles After the n observations in a data set are ordered from smallest to largest the first third quartile is the median of the smallest largest half of the data where the median x is included in both halves if n is odd A measure of the spread that is resistant to outliers is the interquartile range or IQR fs 3rd quartile 1st quartile Outliers Any observation farther than 1 5fs from the closest fourth is an outlier An outlier is extreme if it is more than 3fs from the nearest fourth and it is mild otherwise Boxplots lower fourth extreme outliers mild outliers upper fourth median 40 60 e1scores 80 100 Boxplot example 0 45 0 50 0 55 0 60 0 65 e1questions …
View Full Document
Unlocking...