Unformatted text preview:

5/12/11 Lecture 3 1 STOR 155 Introductory Statistics Lecture 3: Displaying Distributions with Numbers The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL5/12/11 Lecture 3 2 Exploratory Data Analysis (EDA) • Graphical Visualization: Shape – Bar Graph – Pie Chart – Stem plot – Histogram – Time plot (not for distribution, but for changing pattern over time) • Numerical Summary: Center and Spread – Center: • Mean, Median and Mode – Spread: • Quartiles, Five-number summary and Boxplot • Standard Deviation5/12/11 Lecture 3 3 • What is the average highway (city) mileage? • What is the ``middle value’’ of highway (city) mileage?5/12/11 Lecture 3 4 Measuring center: the mean • Mean = average value • The sample mean : If the n observations in a sample are , then their mean is xinnxnxxxx121/)...(nxxx ,...,21x5/12/11 Lecture 3 5 Measuring center: the median5/12/11 Lecture 3 6 Example: Fuel economy (miles per gallon) for 2004 two-seater cars • Look at the Highway mileage (without Honda Insight): – Mean – Median • How about with Honda Insight? – Mean – Median • What can you say?5/12/11 Lecture 3 7 Example: Salary Survey of UNC Graduates • Survey a certain number of graduates from UNC. • A lot of departments are surveyed. • Question: – Which department produces students that earn the most on average ten years after they got their degrees? • Answer: – Geography!!!!?????? – Michael Jordan5/12/11 Lecture 3 8 Mean vs Median • Mean: – easy to calculate – easy to work with algebraically – highly affected by outliers – Not a resistant measure • Median: – can be time consuming to calculate – more resistant to a few extreme observations (sometimes outliers) – robust5/12/11 Lecture 3 9 Mode • The most frequent value in the data • Important for categorical data • Possible to have more than one mode5/12/11 Lecture 3 10 Mean, Median and Mode • If the distribution is exactly symmetric and unimodal, the mean, the median and the mode are exactly the same. • If the distribution is skewed, the three measures differ. Mean Median Mode Mean Median Mode5/12/11 Lecture 3 11 Which one to use? • Different by definition – Mean and median are unique, and only for quantitative variables. – Mode may not be unique. – Mode is defined for categorical variables also. • The choice depends on the shape of the distribution, the type of data and the purpose of your study – Skewed: median – Categorical: mode – Total quantity: mean – …5/12/11 Lecture 3 12 Numerical Summary for Distributions • Center – Mean – Median – Mode • Spread – Quartiles, Five-number summary and Boxplot – Standard Deviation5/12/11 Lecture 3 13 Why do we need “Spread”? • Knowing the center of a distribution alone is not a good enough description of the data. – Two basketball players with the same shooting percentage may be very different in terms of consistency. – Two companies may have the same average salary, but very different distributions. • We need to know the spread, or the variability of the values.5/12/11 Lecture 3 14 A raw measure: Range • Range = maximum - minimum • Depends only on two values • Tends to increase with larger samples • Affected by outliers – Not robust5/12/11 Lecture 3 15 Percentiles • Percentiles are derived from the ordered data values. • The pth percentile is the value such that p percent of the observations fall at or below it. • The median = the 50th percentile.5/12/11 Lecture 3 16 • The sample quartiles are the values that divide the sorted sample into quarters, just as the median divides it into half. • The most commonly used quantiles are – The median M = 50th percentile – The 1st (lower) quartile Q1 = 25th percentile – The 3rd (upper) quartile Q3 = 75th percentile Quartiles5/12/11 Lecture 3 17 Calculations of Quartiles5/12/11 Lecture 3 18 Examples: 2004 Gasoline-powered Two-seater Cars • Highway mileages of the 20 gasoline-powered two-seater cars: 13 15 16 16 17 19 20 22 23 23 | 23 24 25 25 26 28 28 28 29 32 • Q1 = Median of {13 15 16 16 17 19 20 22 23 23 } = • Q3 = median of {23 24 25 25 26 28 28 28 29 32} =5/12/11 Lecture 3 19 Interquartile Range: IQR • IQR = Q3 – Q1 – The range of the center half of the data – A resistant measure for spread • IQR can be used to identify suspected outliers. • Rule-of-thumb: – An observation is called a suspected outlier if it falls more than 1.5*IQR above Q3 or below Q1.5/12/11 Lecture 3 20 Examples: 2004 Gasoline-powered Two-seater Cars • Highway mileages of the 20 gasoline-powered two-seater cars: 13 15 16 16 17 19 20 22 23 23 | 23 24 25 25 26 28 28 28 29 32 • IQR = Q3 – Q1= • 1.5*IQR= • Q3+1.5*IQR= • Q1-1.5*IQR= • Any suspected outliers?5/12/11 Lecture 3 21 Examples: 2004 Two-Seater Cars • Highway mileages of the 21 two-seater cars: 13 15 16 16 17 19 20 22 23 23 23 24 25 25 26 28 28 28 29 32 66 • Q1 = • Q3 = • IQR = Q3 – Q1= • 1.5*IQR= • Q3+1.5*IQR= • Q1-1.5*IQR= • Any suspected outliers?5/12/11 Lecture 3 22 The five-number summary • To get a quick summary of both center and spread, use the following five-number summary: Minimum Q1 M Q3 Maximum5/12/11 Lecture 3 23 Example: HWY Gas Mileage of 2004 Two-seater/Mini Cars • Two-seater – Five-number summary: • 13, 18, 23, 27, 32 • Mini-compact (the other half of Fig. 1.10) – Five-number summary: • 19, 23, 26, 29, 325/12/11 Lecture 3 24 Boxplots • a visual representation of the five-number summary. • A boxplot consists of – A central box spans the quartiles Q1 and Q3. – A line inside the box marks the median M. – Lines extend from the box out to the smallest and largest observations.5/12/11 Lecture 3 25 Boxplots of highway/city gas mileages (Two-seaters/minicompacts)5/12/11 Lecture 3 26 Pros and cons of Boxplots • Location of the median line in the box indicates symmetry/asymmetry. • Best used for side-by-side comparison of more than one distribution at a glance. • Less detailed than histograms or stem plots. • The box focuses attention on the central half of the data.5/12/11 Lecture 3 27 Income for different Education Level5/12/11 Lecture 3 28 Modified Boxplot • The current boxplot can not reveal those possible outliers. • To modify it, – the two lines extend out from the


View Full Document

UNC-Chapel Hill STOR 155 - Displaying Distributions with Numbers

Documents in this Course
Exam 1

Exam 1

2 pages

Load more
Download Displaying Distributions with Numbers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Displaying Distributions with Numbers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Displaying Distributions with Numbers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?