UNC-Chapel Hill STOR 155 - Lecture 4- Displaying Distributions with Numbers (II)

Unformatted text preview:

5/14/10 Lecture 4 1STOR 155 Introductory StatisticsLecture 4: Displaying Distributions with Numbers (II)The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL5/14/10 Lecture 4 2Numerical Summary for Distributions• Center– Mean– Median– Mode• Spread– Quartiles, IQR, Five-number summary and Boxplot– Standard Deviation (starting from page14)5/14/10 Lecture 4 3Examples: 2004 Two-Seater Cars• Highway mileages of the 21 two-seater cars:13 15 16 16 17 19 20 22 23 23 23 24 25 25 26 28 28 28 29 32 66• Q1 =18• Q3 =28• IQR = Q3 – Q1=10• 1.5*IQR=15• Q3+1.5*IQR=43• Q1-1.5*IQR=3• 66 is a suspected outlier.5/14/10 Lecture 4 4The five-number summary• To get a quick summary of both center and spread, use the following five-number summary:Minimum Q1 M Q3 Maximum5/14/10 Lecture 4 5Example: HWY Gas Mileage of 2004 Two-seater/Mini Cars• Two-seater– Five-number summary:• 13, 18, 23, 27, 32• Mini-compact– Five-number summary:• 19, 23, 26, 29, 325/14/10 Lecture 4 6Boxplots• a visual representation of the five-number summary.• A boxplot consists of– A central box spans the quartiles Q1 and Q3.– A line inside the box marks the median M.– Lines extend from the box out to the smallest and largest observations.5/14/10 Lecture 4 7Boxplots of highway/city gas mileages (Two-seaters/minicompacts)5/14/10 Lecture 4 8Pros and cons of Boxplots• Location of the median line in the box indicates symmetry/asymmetry.• Best used for side-by-side comparison of more than one distribution at a glance.• Less detailed than histograms or stem plots.• The box focuses attention on the central half of the data.5/14/10 Lecture 4 9Income for different Education Level5/14/10 Lecture 4 10Modified Boxplot• The current boxplot can not reveal those possible outliers.• To modify it, – the two lines extend out from the central box only to the smallest and largest observations that are not suspected outliers. – Observations more than 1.5*IQR outside the box are plotted as individual points.5/14/10 Lecture 4 11Call length (seconds)5/14/10 Lecture 4 12HG for count in a given time interval5/14/10 Lecture 4 135/14/10 Lecture 4 14Sample Variance s2• Deviation from mean: :the difference between an observation and the sample mean:• Sample Variance s2: the average of squaresof the deviations of the observations from their meanxxi1)(1)(...)()(12222212nxxnxxxxxxsniin5/14/10 Lecture 4 15Sample Standard Deviation s• Sample Standard Deviation s: the square root of the sample variance1)(12nxxsnii5/14/10 Lecture 4 16Toy Examples• Data:-2, -1, 0, 1, 2• What is the sample variance and the standard deviation?• How about this?40, 40, 40, 40, 405/14/10 Lecture 4 17Remarks on the definition of Standard Deviation (S.D.)• The sum of the deviations of the obs from their mean is always 0.• Why “square the deviations” rather than “absolute deviations”?– Mean is a natural center under the “squaring”.– S.D. is a natural measure of spread for the normal distributions.5/14/10 Lecture 4 18Remarks on S.D.• Why “S.D.” rather than “variance”?– S.D. is natural for measuring spread for normal dist.– S.D. is in the original scale.• Why “n-1” rather than “n”?– Intuitively speaking, S.D. is not defined for n=1. – Sum of deviations is always 0, which means “if we know (n-1) of them, we know the last one”.– Only (n-1) deviations can change freely.– n-1: degrees of freedom.5/14/10 Lecture 4 19Properties of the standard deviation (S.D.) s• s measures the spread about the mean;• s should be used only when the mean is chosen to measure the center;• s=0 if and only if there is no spread;– When?• s>0 almost always, increases with more spread;• s, like the mean, is not resistant, i.e. sensitive to outliers.5/14/10 Lecture 4 20Examples: 2004 Two-seater CarsHighway mileages of the 21 two-seater cars:13 15 16 16 17 19 20 22 23 23 23 24 25 25 26 28 28 28 29 32 66• Gasoline-powered cars– Mean: 22.6 – S.D.=5.3• All cars– Mean: 24.7– S.D.=10.85/14/10 Lecture 4 21Three measures of spread• The range is the spread of all the observations;• The interquartile range is the spread of (roughly) the middle 50% of the observations;• S.D. is a measure of the distance from sample mean. S.D. can be regarded as a “typical” distance of the observations from their mean.5/14/10 Lecture 4 22The five-number summary vs Mean and S.D.• The five-number summary is preferred for a skewed distribution or a distribution with strong outliers.• and s are preferred for reasonably symmetric distributions that are free of outliers.• Always plot your data first.• Use boxplots.x5/14/10 Lecture 4 23Changing the unit of measurement• The same variable can be recorded in different units of measurement.• Distance:– Miles (US) vs Kilometers (Elsewhere)– 1 mile = 1.6 km– 1 km = ? mile• Temperature– Fahrenheit (US) vs Celsius (Elsewhere)– 0 F = -17.8 C– 100 F = 37.8 C– 212 F =100 C5/14/10 Lecture 4 24Boiled Billy• An Australian student Billy has recently been on a trip to the States. Soon after he arrived there, he caught a cold and had a fever.• He went to see Doctor Z. Doctor Z measured his body temperature and told Billy, “Just relax! No big deal! It’s only a little above 100 degree!”• “100!!!”, Billy yelled, “How can you say it’s not a big deal? I am boiled…”5/14/10 Lecture 4 25Linear Transformation• A linear transformation changes the original variable into a new variable according to the following equation, • Temperature: Celsius vs Fahrenheit– in Celsius, in Fahrenheit,– How about the inverse transformation?xnewx.bxaxnewxnewx.5932 xxnew5/14/10 Lecture 4 26Effects of Linear Transformation• The shape of a distribution remains unchanged, except that the direction of the skewness might change.– When?• Measures of center and spread change.– Multiplying each obs by a positive number bmultiplies both measures of center and spread by b;– Adding the same number a to each obs adds a to measures of center and to percentiles, but does not change measures of spread.5/14/10 Lecture 4 27Example: Salary Raise• A sample was taken of the salaries of 20 employees of a large company. Suppose everyone will receive a $3000 increase, then • how will the standard deviation of the salaries change? • How about


View Full Document

UNC-Chapel Hill STOR 155 - Lecture 4- Displaying Distributions with Numbers (II)

Documents in this Course
Exam 1

Exam 1

2 pages

Load more
Download Lecture 4- Displaying Distributions with Numbers (II)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 4- Displaying Distributions with Numbers (II) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 4- Displaying Distributions with Numbers (II) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?