1/18/11 Lecture 2-2 1 STOR 155 Introductory Statistics Lecture 2-2: Displaying Distributions with Graphs The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL1/18/11 Lecture 2-2 2 Recall • Data: – Individuals – Variables • Categorical variables • Quantitative variables • Distribution of variables • Graphical tools for categorical data – Bar graph – Pie chart • Graphical tools for quantitative data – Stemplot1/18/11 Lecture 2-2 3 Example: A study on litter size • Data: (170 observations) 4 6 5 6 7 3 6 4 4 6 4 4 9 5 10 6 6 5 6 8 2 7 7 7 9 3 7 5 7 7 4 5 5 6 7 6 7 8 6 6 7 6 6 7 5 4 5 6 6 1 3 4 7 5 4 7 5 8 8 5 6 8 5 5 4 9 6 7 3 7 7 5 4 6 9 6 7 7 5 7 3 7 6 5 3 7 10 5 6 8 7 5 5 7 5 5 8 9 7 5 7 5 5 5 6 3 7 8 7 7 6 3 4 4 4 7 2 7 8 5 8 6 6 5 6 4 7 5 5 6 9 3 5 4 8 3 9 8 3 6 5 4 7 8 4 8 6 8 5 6 4 3 8 8 6 9 5 5 6 6 7 6 8 6 11 6 5 6 6 31/18/11 Lecture 2-2 4 Stem-and-leaf plot for pups 0|122333333333333344… (35) 0|555555555555555555555555... (132) 1| 0011/18/11 Lecture 2-2 5 Histogram • breaks the range of the values of a quantitative variable into intervals and displays only the count or percent of the observations that fall into each interval. • You can choose any convenient number of intervals. • Intervals must be of equal width (except at the two ends ?)1/18/11 Lecture 2-2 6 Example: A study on litter size1/18/11 Lecture 2-2 7 Data analysis in action: show steps in doing HG …1/18/11 Lecture 2-2 8 Data analysis in action: count1/18/11 Lecture 2-2 9 Example: Call Center Data • Financial firm call center • Calls handled by Avi within 60 seconds – October: 666 – December: 5231/18/11 Lecture 2-2 10 October Histogram0204060801001206 12 18 24 30 36 42 48 54 60calling timeFrequencyFrequency1/18/11 Lecture 2-2 11 December Histogram0204060801001206 12 18 24 30 36 42 48 54 60calling timeFrequencyFrequency1/18/11 Lecture 2-2 12 Notes for Making Histogram • Choose the number of classes sensibly (Fig 1.4, 1.8). • Intervals must be of equal width. • Areas of the bars are proportional to the frequency.1/18/11 Lecture 2-2 13 Examining Distributions • Overall Pattern – Shape – Center (numerical, Lecture 3) • midpoint – Spread (numerical, Lecture 3) • range • Deviations – Outliers: some values that fall outside the overall pattern.1/18/11 Lecture 2-2 14 Shapes of Distributions • Graphs can help to determine shapes. – Modes: local peaks of a distribution. • Unimodal: one peak • Bimodal: two peaks – Symmetric or skewed?1/18/11 Lecture 2-2 15 Shakespeare’s Words: Uni-modal1/18/11 Lecture 2-2 16 Tuition and fees: bimodal or trimodal1/18/11 Lecture 2-2 17 A bimodal histogram A modal class A modal class1/18/11 Lecture 2-2 18 Right skewed Left skewed1/18/11 Lecture 2-2 19 Iowa Test of Basic Skills vocabulary scores1/18/11 Lecture 2-2 20 A study on litter size1/18/11 Lecture 2-2 21 Bell-shaped Histograms1/18/11 Lecture 2-2 22 Summary: Shapes of Distributions • Symmetric: – histogram in which the right half is a mirror image of the left half. • Skewed to the right: – histogram in which the right tail is more stretched out than the left.(long tail to the right) • Skewed to the left: – histogram the left tail is more stretched out than the right.(long tail to the left) • Number of modal classes: – the number of distinct peaks in a histogram • Bell-shaped: – A histogram looks like a bell.1/18/11 Lecture 2-2 23 Time plots • A time plot of a variable plots each obs against the time at which it was measured. – Time: x-axis – Variable: y-axis – Examples: stock price, unemployment rate, daily temperature – Great for identifying changing patterns over time. • What to look for – Trend – Seasonal variations – Major deviations1/18/11 Lecture 2-2 24 Example: Number of Suicides in USA (1900-1970)1/18/11 Lecture 2-2 25 Call Center: Daily Call Volume in Sep. 2002 0 10000 20000 30000 40000 50000 60000 70000 # of Calls for Agent 0 5 10 15 20 25 30 Date (in September) Time Plot of # of Calls for Agent By Date (in September)1/18/11 Lecture 2-2 26 Outliers • Observations that lie outside the overall pattern of a distribution. • Possible reasons: – error in data entry (most likely reason) • Equipment failure • Human error • Missing value code – extraordinary individuals (Jordan’s salary)1/18/11 Lecture 2-2 27 Handling Outliers • Detect it using graphical and numerical methods. • Check the data to make sure correct entry. • Reducing influence of outlier – delete the observation (BE CAREFUL!) – Use transformations, robust methods.1/18/11 Lecture 2-2 28 Call Center: Daily Call Volume in Sep. 2002 0 10000 20000 30000 40000 50000 60000 70000 # of Calls for Agent 0 5 10 15 20 25 30 Date (in September) Time Plot of # of Calls for Agent By Date (in September)1/18/11 Lecture 2-2 29 Take Home Message • Examine distributions: – Overall pattern • Shape – Symmetric or skewed – How many modes? – Bell-shaped – Outliers • Graphical tools for quantitative data – Histograms – Time
View Full Document