Unformatted text preview:

Sept 28 2006 LEC 1 I ECON 240A 1 Exploratory Data Analysis L Phillips I Introduction At the beginning of the course we will study three branches of statistics 1 data analysis 2 probability and 3 statistical inference Data analysis is the gathering display and summary of data We will use visual devices and quantitative measures to accomplish these tasks Probability has its origins in gambling and the laws of chance This topic is interesting in its own right but we will also use probability as a means to better understand the binomial distribution the central limit theorem and the relationship between the binomial distribution and the normal distribution II Data Description One use of statistics is to describe data with summary measures Two notions are central tendency and dispersion There are several measures of central tendency An intuitive and relative easy measure to use is the mode i e the data value that is observed most frequently Of course one issue is what if the data has two or three modes and has multiple peaks Another measure of central tendency is the median The data can be sorted and ordered from the highest value to the lowest and the data point in the middle is the median with one half of the data values above and one half of the data values below Another measure of central tendency requiring some arithmetic is the sample mean of the data Add up all the data values and divide by the number of observations or data points III Exploratory Data Analysis Sept 28 2006 LEC 1 ECON 240A 2 Exploratory Data Analysis L Phillips John Tukey developed exploratory data analysis to visually describe the characteristics of data Two visual tools useful for this purpose are the stem and leaf diagram and the box and whiskers plot An example of the methodology of the stem and leaf plot is its application to weight data from males and females at Penn State taken from Larry Gonick Woolcott Smith The Cartoon Guide to Statistics 1993 Males 140 145 160 190 155 165 150 190 195 138 160 155 153 145 170 175 175 170 180 135 170 157 130 185 190 155 170 155 215 150 145 155 155 150 155 150 180 160 135 160 130 155 150 148 155 150 140 180 190 145 150 164 140 142 136 123 155 Females 140 120 130 138 121 125 116 145 150 112 125 130 120 130 131 120 118 125 135 125 118 122 115 102 115 150 110 116 108 95 125 133 110 150 108 For this illustration the data is pooled without regard to gender The first step is to determine the range of the data the minimum weight and the maximum weight 95 and 215 respectively The second step is to construct the stem counting by tens from 9 for 90 10 for 100 etc out to 21 for 210 9 10 11 12 13 14 15 Sept 28 2006 LEC 1 ECON 240A 3 Exploratory Data Analysis L Phillips 16 17 18 19 20 21 Figure 1 Stem of the Stem and Leaf Diagram The third step is to construct the leaves use the second digit of 95 the lowest weight which is placed after 9 on the stem There are three weights between 100 and 110 102 108 and 108 so the digits following 10 on the stem are 2 8 8 This is a leaf attached to the stem at 10 Continuing in this fashion 9 5 10 2 8 8 11 6 2 8 8 5 5 0 6 0 12 3 0 1 5 5 0 0 5 5 2 5 13 8 5 0 5 0 6 0 8 0 0 1 5 3 14 0 5 5 5 8 0 5 0 2 0 5 15 5 0 5 3 7 5 5 0 5 5 0 5 0 5 0 5 0 0 5 0 0 0 16 0 5 0 0 0 4 17 0 5 5 0 0 0 18 0 5 0 0 Sept 28 2006 LEC 1 ECON 240A 4 Exploratory Data Analysis L Phillips 19 0 0 5 0 0 20 21 5 Figure 2 Preliminary Leaves in the Stem and Leaf Diagram The last step is to order the digits composing the leaves This provides a visual description of the data including the minimum the maximum the modes and the median 9 5 10 2 8 8 11 0 0 2 5 5 6 6 8 8 12 0 0 0 1 2 3 5 5 5 5 5 13 0 0 0 0 0 1 3 5 5 5 6 8 8 14 0 0 0 0 2 5 5 5 5 5 8 15 0 0 0 0 0 0 0 0 0 0 3 5 5 5 5 5 5 5 5 5 5 7 16 0 0 0 0 4 5 17 0 0 0 0 5 5 18 0 0 0 5 19 0 0 0 0 5 20 21 5 Figure 3 Stem and Leaf Diagram Sept 28 2006 LEC 1 ECON 240A 5 Exploratory Data Analysis L Phillips Of course this back of the envelope technology could be combined with using a computer to sort or order the data In all there are 92 observations or data points So the median would lie between the 46th and 47th observation i e between 145 and 145 so the median is 145 Note the data is bimodal with ten 150 s and ten 155 s The students have a reporting bias tending to round off to zeros and fives IV Dispersion One measure of dispersion is the interquartile range IQR Sort the data and put the points into four groups with equal numbers of observations There will be two groups above the median and two groups below the median If the median is a data point add it to both the upper group and the lower group In the case of the weight data we had an even number of observations and the median fell between two observations the 46th and the 47th which were both equal to 145 Next find the median for the two high groups i e the third quartile with 25 percent of the observations above it Also find the median for the two lowest groups i e the first quartile with 25 percent of the observations below it The difference between the median for the highs and the median for the lows is the interquartile range Having already done the work for the weight data by constructing the stem and leaf diagram we can use it to determine the first quartile of 125 pounds between the 23rd observation of 125 pounds and the 24th observation of 125 pounds The third quartile is between the 23rd and 24th observation from the top i e between 157 pounds and 155 pounds so the third quartile is 156 pounds and the interquartile range is 156 minus 125 or 31 pounds Sept 28 2006 LEC 1 ECON 240A 6 Exploratory Data Analysis L Phillips John Tukey s box and …


View Full Document

UCSB ECON 240a - Exploratory Data Analysis

Documents in this Course
Final

Final

8 pages

power_16

power_16

64 pages

final

final

8 pages

power_16

power_16

64 pages

Power One

Power One

63 pages

midterm

midterm

6 pages

power_16

power_16

39 pages

Lab #9

Lab #9

7 pages

Power 5

Power 5

59 pages

Final

Final

13 pages

Final

Final

11 pages

Midterm

Midterm

8 pages

Movies

Movies

28 pages

power_12

power_12

53 pages

midterm

midterm

4 pages

-problems

-problems

36 pages

lecture_7

lecture_7

10 pages

final

final

5 pages

power_4

power_4

44 pages

power_15

power_15

52 pages

group_5

group_5

21 pages

power_13

power_13

31 pages

power_11

power_11

44 pages

lecture_6

lecture_6

12 pages

power_11

power_11

42 pages

lecture_8

lecture_8

11 pages

midterm

midterm

9 pages

power_17

power_17

13 pages

power_14

power_14

55 pages

Final

Final

13 pages

Power One

Power One

53 pages

Summary

Summary

54 pages

Midterm

Midterm

6 pages

Lab #7

Lab #7

5 pages

powe 14

powe 14

32 pages

Lab #7

Lab #7

5 pages

Midterm

Midterm

8 pages

Power 17

Power 17

13 pages

Midterm

Midterm

6 pages

Lab Five

Lab Five

30 pages

power_16

power_16

64 pages

power_15

power_15

52 pages

Power One

Power One

64 pages

Final

Final

14 pages

Load more
Loading Unlocking...
Login

Join to view Exploratory Data Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exploratory Data Analysis and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?