Unformatted text preview:

Descriptive StatisticsBios 662Michael G. Hudgens, [email protected]://www.bios.unc.edu/∼mhudgens2007-08-21 17:01BIOS 662 1 Descriptive StatisticsDescriptive Statistics• Types of variables• Measures of location• Measures of spread, shape• Data displaysBIOS 662 2 Descriptive StatisticsTypes of Variables• A variable is a quantity that may vary from object toobject• A sample or data set is a collection of values of one ormore variables.• Types of variables– Quantitative variable intrinsically numericale.g. age, height, counts– Qualitative (categorical) - intrinsically nonnumer icale.g. gender, province, countryBIOS 662 3 Descriptive StatisticsTypes of Variables• Qualitative (categorical) - intrinsically nonnumerical– Binary, dichotomouse.g., alive /dead, female/male– Ordinal - natural orderinge.g., diagnosis (certain, probable, unlikely, ...)e.g., attitude (strongly agree, agree, neutral, ...)– Nominal - no natural orderinge.g., religion, race• In recording qualitative data, numerical values may beassignedBIOS 662 4 Descriptive StatisticsDescriptive Statistics• Types of variables• Measures of location• Measures of spread, shape• Data displaysBIOS 662 5 Descriptive StatisticsMeasures of Location• (Arithmetic) Mean• Percentiles• Median• Mode• Geometric meanBIOS 662 6 Descriptive StatisticsArithmetic mean• Data:x1, x2, . . . , xn• Mean:¯x =x1+ x2+ ··· + xnn=1nnXi=1xiBIOS 662 7 Descriptive StatisticsExample• Duration of hospital stay in days:x1= 5, x2= 10, x3= 6, x4= 11• Mean:¯x =14(5 + 10 + 6 + 11) =324= 8BIOS 662 8 Descriptive StatisticsReporting of dec imals• Report mean with one more significant digit than theobservations• Example:If x is measured in whole numbers and¯x = 6.345, report¯x = 6.3.BIOS 662 9 Descriptive StatisticsProperties of Mean• Let c be any constant• Ifyi= xi+ c for i = 1, 2, 3, . . . , n,then¯y =¯x + c• Ifyi= cxifor i = 1, 2, 3, . . . , n,then¯y = c¯xBIOS 662 10 Descriptive StatisticsProperties of Mean - Example• A sample of birth weights in a hospital found¯y = 3166.9 grams• 1 oz = 28.35 g• Therefore the mean in ozs. is¯x =¯y28.35= 111.7BIOS 662 11 Descriptive StatisticsOrder statistics• Data: x1, x2, . . . , xn• Order data from smallest to largestx(1)≤ x(2)≤ ··· ≤ x(n)• x(1), x(2), . . . , x(n)are order statisitics• Notex(1)= min{x1, x2, . . . , xn}x(n)= max{x1, x2, . . . , xn}BIOS 662 12 Descriptive StatisticsExample• Duration of hospital stay in days:x1= 5, x2= 10, x3= 6, x4= 11• Order statistics:x(1)= 5, x(2)= 6, x(3)= 10, x(4)= 11BIOS 662 13 Descriptive StatisticsPercentiles• Intuitive definition: the x percentile is such that x% ofthe observations are less than that value• Also known as sample quantileBIOS 662 14 Descriptive StatisticsPercentiles: Text definition• The (p × 100)thpercentile of a sampleˆζp=y(np+p)if np + p is an integer{y(bnp+pc)+ y(dnp+pe)}/2 otherwisefor 0 < p < 1• Note: byc is the greatest integer ≤ y; i.e., the floorfunctiondye is the s mallest integer ≥ y; i.e., the ceiling function• Cf Def 3.11 of textBIOS 662 15 Descriptive StatisticsPercentiles: General form• General form (Hyndman and Fan, Am Stat 1996)ˆζp= (1 − γ)y(j)+ γy(j+1)where j = bpn + mc for some m ∈ R and 0 ≤ γ ≤ 1.• Let g = pn + m − j• If m = p andγ =0 if g = 01/2 if g > 0then j = bpn + pc and we recover text definitionBIOS 662 16 Descriptive StatisticsPercentiles: Software• SAS Proc Univariate: 5 definitions of percentile• R: 9 definitions• Claim: none of these match the book definitionBIOS 662 17 Descriptive StatisticsR “quantile()” function> ?quantilequantile package:stats R DocumentationSample QuantilesDescription:The generic function ’quantile’ produces sample quantilescorresponding to the given probabilities. The smallest observationcorresponds to a probability of 0 and the largest to a probabilityof 1.Usage:quantile(x, ...)## Default S3 method:quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,names = TRUE, type = 7, ...)Arguments:BIOS 662 18 Descriptive Statisticsx: numeric vectors whose sample quantiles are wanted.probs: numeric vector of probabilities with values in [0,1].na.rm: logical; if true, any ’NA’ and ’NaN’’s are removed from ’x’before the quantiles are computed.names: logical; if true, the result has a ’names’ attribute. Set to’FALSE’ for speedup with many ’probs’.type: an integer between 1 and 9 selecting one of the nine quantilealgorithms detailed below to be used....: further arguments passed to or from other methods.Types:’quantile’ returns estimates of underlying distribution quantilesbased on one or two order statistics from the supplied elements in’x’ at probabilities in ’probs’. One of the nine quantilealgorithms discussed in Hyndman and Fan (1996), selected by’type’, is employed.BIOS 662 19 Descriptive StatisticsPercentiles: Class Definition• The (p × 100)thpercentile of a sample:ˆζp=y(bnpc+1)if np is not an integer{y(np)+ y(np+1)}/2 if np is an integerfor 0 < p < 1• Defintion 2 of R/Hyndman and Fan: m = 0 andγ =1 if g > 01/2 if g = 0• Defintion 5 of SASBIOS 662 20 Descriptive StatisticsExample• Suppose n = 278 and we want the 75th percentilenp = 278 × .75 = 208.5such thatˆζ.75= x(209)• R> x <- 1:278> quantile(x,.75,type=2)75%209BIOS 662 21 Descriptive StatisticsExample: SASdata;infile "H:/WWW/bios/662/2007fall/percentile.txt";input x;proc univariate; var x; run;The UNIVARIATE ProcedureVariable: xQuantiles (Definition 5)Quantile Estimate75% Q3 209.050% Median 139.525% Q1 70.010% 28.05% 14.01% 3.00% Min 1.0BIOS 662 22 Descriptive StatisticsMedian• The sample median is the 50th percentileˆζ.5=y(n+12)if n is odd{y(n/2)+ y(n/2+1)}/2 if n is evenfor 0 < p < 1BIOS 662 23 Descriptive StatisticsExample• Duration of hospital stay in days:x1= 5, x2= 10, x3= 6, x4= 11• Median:ˆζ.5= {x(2)+ x(3)}/2 = (6 + 10)/2 = 8BIOS 662 24 Descriptive StatisticsMode• The mode is the most frequently occurring value in thedata set• E.g., ifx1= 5, x2= 11, x3= 6, x4= 11then mode is 11BIOS 662 25 Descriptive StatisticsGeometric Mean• Data: x1, x2, . . . , xn• The geometric mean of x is¯xg= (x1x2···xn)1/n• Let yi= log(xi) for i = 1, 2, . . . , n. Then¯xg= exp(¯y)•¯xgis used when data are of the form ck• Eg, suppose x1= 10 and x2= 0.1.


View Full Document

UNC-Chapel Hill BIOS 662 - Descriptive Statistics

Download Descriptive Statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Descriptive Statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Descriptive Statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?