Descriptive StatisticsBios 162: Lecture 2Michael G. Hudgens, [email protected]://www.bios.unc.edu/∼mhudgensBIOS 162: Lecture 2 1 Descriptive StatisticsDescriptive Statistics• Types of variables• Measures of location• Data displaysBIOS 162: Lecture 2 2 Descriptive StatisticsMeasures of Location• (Arithmetic) Mean• Percentiles• Median• Mode• Geometric meanBIOS 162: Lecture 2 3 Descriptive StatisticsArithmetic mean• Data:x1, x2, . . . , xn• Mean:¯x =x1+ x2+ · · · + xnn=1nnXi=1xiBIOS 162: Lecture 2 4 Descriptive StatisticsExample• Duration of hospital stay in days:x1= 5, x2= 10, x3= 6, x4= 11• Mean:¯x =14(5 + 10 + 6 + 11) =324= 8BIOS 162: Lecture 2 5 Descriptive StatisticsReporting of decimals• Report mean with one more significant digit than theobservations• Example:If x is measured in whole numbers and¯x = 6.345, report¯x = 6.3.BIOS 162: Lecture 2 6 Descriptive StatisticsProperties of Mean• Let c be any constant• Ifyi= xi+ c for i = 1, 2, 3, . . . , n,then¯y =¯x + c• Ifyi= cxifor i = 1, 2, 3, . . . , n,then¯y = c¯xBIOS 162: Lecture 2 7 Descriptive StatisticsProperties of Mean - Example• A sample of birth weights in a hospital found¯y = 3166.9 grams• 1 oz = 28.35 g• Therefore the mean in ozs. is¯x =¯y28.35= 111.7BIOS 162: Lecture 2 8 Descriptive StatisticsOrder statistics• Data: x1, x2, . . . , xn• Order data from smallest to largestx(1)≤ x(2)≤ · · · ≤ x(n)• x(1), x(2), . . . , x(n)are “order statisitics”• Notex(1)= min{x1, x2, . . . , xn}x(n)= max{x1, x2, . . . , xn}BIOS 162: Lecture 2 9 Descriptive StatisticsExample• Duration of hospital stay in days:x1= 5, x2= 10, x3= 6, x4= 11• Order statistics:x(1)= 5, x(2)= 6, x(3)= 10, x(4)= 11BIOS 162: Lecture 2 10 Descriptive StatisticsPercentiles• The pthpercentile of a sample:ˆζp=y([np]+1)if np/100 is not an integer(y(np)+ y(np+1))/2 if np/100 is an integerfor 0 < p < 1• Note: [y] is the integer part of yBIOS 162: Lecture 2 11 Descriptive StatisticsExample• Suppose n = 278 and we want the 75th percentilenp = 278 × .75 = 208.5such thatˆζ.75= x(209)BIOS 162: Lecture 2 12 Descriptive StatisticsMedian• The sample median is the 50th percentileˆζ.5=y(n+1/2)if n is odd(y(n/2)+ y(n/2+1))/2 if n is evenfor 0 < p < 1BIOS 162: Lecture 2 13 Descriptive StatisticsExample• Duration of hospital stay in days:x1= 5, x2= 10, x3= 6, x4= 11• Median:ˆζ.5= (x(2)+ x(3))/2 = (6 + 10)/2 = 8BIOS 162: Lecture 2 14 Descriptive StatisticsMode• The mode is the most frequently occurring value in thedataset• In the hospital stay example, there is no mode since allvalues occur equally oftenBIOS 162: Lecture 2 15 Descriptive StatisticsGeometric Mean• Data: x1, x2, . . . , xn• Let yi= log(xi) for i = 1, 2, . . . , n• The geometric mean of x is¯xg= exp(¯y)•¯xgis used when data are of the form ck• Note: one can use any base for the logarithmBIOS 162: Lecture 2 16 Descriptive StatisticsComments• Mean is most often used measure• Median is better if there are influential observations(more robust to outliers)• If distribution is symmetric, mean equals median• Mode rarely usedBIOS 162: Lecture 2 17 Descriptive StatisticsExample• Duration of hospital stay in days:x1= 5, x2= 10, x3= 6, x4= 11ˆζ.5=¯x = 8• Alter last observation:x1= 5, x2= 10, x3= 6, x4= 50ˆζ.5= 8,¯x = 17.7BIOS 162: Lecture 2 18 Descriptive StatisticsMeasures of Spread• Range• Variance and standard deviation• Interquartile rangeBIOS 162: Lecture 2 19 Descriptive StatisticsRange• Range:ra= x(n)− x(1)• Easy to calculate• Sensitive to unusual observations (outliers)• Usually, the larger n is, the larger raBIOS 162: Lecture 2 20 Descriptive StatisticsVariance and Standard Deviation• Want to measure deviation from mean• Variances2=1n − 1nXi=1(xi−¯x)2=1n − 1(nXi=1(x2i− n¯x2)• Standard deviations =ps2BIOS 162: Lecture 2 21 Descriptive StatisticsStandard Deviation• The units of s are the same as the units of xi• If s is large, the data are spread over a wide range• Report the standard deviation with two more significantdigits than the original observationsBIOS 162: Lecture 2 22 Descriptive StatisticsProperties of the Standard Deviation• If c is a constant andyi= xi+ c,thensy= sx• Ifyi= cxithensy= csxBIOS 162: Lecture 2 23 Descriptive StatisticsSome approximations• The interval¯x ± s will contain approx 68% of the ob-servations• The interval¯x ± 2s will contain approx 95% of the ob-servations• Approx s bys ≈ˆζ.75−ˆζ.251.35• Noteˆζ.75−ˆζ.25is called interquartile rangeBIOS 162: Lecture 2 24 Descriptive StatisticsSymmetry and Skewness• If a distribution is symmetric,mean=median• A unimodal distribution is right skewed if mean >mode; left skewed if mean < mode• Skewnessa3=Pi(yi−¯y)3{Pi(yi−¯y)2}3/2• If a3= 0 distribution is symmetric; a3> 0 right skewed;a3< 0 left skewedBIOS 162: Lecture 2 25 Descriptive StatisticsFrequency Table• A frequency table gives the frequency of observationswithin a set of ordered intervals• Intervals should be mutually exclusive and exhaustive• 8 to 10 intervals is usually sufficient• With the exception of the end intervals, the length ofthe intervals should be constantBIOS 162: Lecture 2 26 Descriptive StatisticsGraphs• Histogram• Stem and leaf plot• Box plotBIOS 162: Lecture 2 27 Descriptive StatisticsHistogram• Data are divided into intervals as in a frequency table• A histogram is a bar graph with the area of each barproportional to the frequency in the interval.BIOS 162: Lecture 2 28 Descriptive StatisticsHistogram: Example> par(mfcol=c(1,2))> hist(liver$albumin,col="gray",xlab="Albumin (mg/dl)",breaks=7,freq=F,main="")> hist(liver$albumin,col="gray",xlab="Albumin (mg/dl)",breaks=30,freq=F,main="")Albumin (mg/dl)Density1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.00.0 0.2 0.4 0.6 0.8Albumin (mg/dl)Density2.0 2.5 3.0 3.5 4.0 4.50.0 0.2 0.4 0.6 0.8 1.0BIOS 162: Lecture 2 29 Descriptive StatisticsStem and Leaf Plot• Stem consists of leading digits• Leaves consist of last digit• Example: x=496, stem=49, leaf=6• Make a column of stems from smallest to largest• To the right of each stem, list in a row the leaves, inascending order.• Note: there will be one leaf for each observationBIOS 162: Lecture 2 30 Descriptive StatisticsStem and Leaf Plot: Example> stem(liver$albumin)The decimal point is 1 digit(s) to the left of the |18 |
View Full Document