DOC PREVIEW
UCSB ECON 240a - Exploratory Data Analysis

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Sept. 23, 2003 LEC #1 ECON 240A-1 L. PhillipsExploratory Data AnalysisI. I. IntroductionAt the beginning of the course we will study three branches of statistics: (1) data analysis, (2) probability, and (3) statistical inference.Data analysis is the gathering, display and summary of data. We will use visual devices and quantitative measures to accomplish these tasks.Probability has its origins in gambling and the laws of chance. This topic is interesting in its own right but we will also use probability as a means to better understand the binomial distribution, the central limit theorem, and the relationship between the binomial distribution and the normal distribution.II. Data DescriptionOne use of statistics is to describe data with summary measures. Two notions are central tendency and dispersion.There are several measures of central tendency. An intuitive and relative easy measure to use is the mode, i.e. the data value that is observed most frequently. Of courseone issue is what if the data has two or three modes and has multiple peaks.Another measure of central tendency is the median. The data can be sorted and ordered from the highest value to the lowest, and the data point in the middle is the median, with one half of the data values above and one half of the data values below.Another measure of central tendency requiring some arithmetic is the sample mean of the data. Add up all the data values and divide by the number of observations or data points.III. Exploratory Data AnalysisSept. 23, 2003 LEC #1 ECON 240A-2 L. PhillipsExploratory Data AnalysisJohn Tukey developed exploratory data analysis to visually describe the characteristics of data. Two visual tools useful for this purpose are the stem and leaf diagram and the box and whiskers plot.An example of the methodology of the stem and leaf plot is its application to weight data from males and females at Penn State, taken from Larry Gonick & Woolcott Smith, The Cartoon Guide to Statistics(1993).Males: 140 145 160 190 155 165 150 190 195 138 160 155 153 145 170 175 175 170 180135 170 157 130 185 190 155 170 155 215 150 145 155 155 150 155 150 180 160 135 160 130 155 150 148 155 150 140 180 190 145 150 164 140 142 136 123 155 Females: 140 120 130 138 121 125 116 145 150 112 125 130 120 130 131 120 118 125 135 125 118 122 115 102 115 150 110 116 108 95 125 133 110 150 108For this illustration, the data is pooled without regard to gender. The first step is to determine the range of the data, the minimum weight and the maximum weight, 95 and 215, respectively. The second step is to construct the stem, counting by tens from 9 for 90, 10 for 100, etc. out to 21 for 210.-----------------------------------------------------------------------------------------------------9101112131415Sept. 23, 2003 LEC #1 ECON 240A-3 L. PhillipsExploratory Data Analysis161718192021Figure 1 : Stem of the Stem and Leaf Diagram-----------------------------------------------------------------------------------------------------------The third step is to construct the leaves: use the second digit of 95, the lowest weight, which is placed after 9 on the stem. There are three weights between 100 and 110: 102, 108, and 108 so the digits following 10 on the stem are 2, 8, 8. This is a leaf attached to the stem at 10. Continuing in this fashion:------------------------------------------------------------------------------------------------------------9: 510: 2 8 811: 6 2 8 8 5 5 0 6 012: 3 0 1 5 5 0 0 5 5 2 513: 8 5 0 5 0 6 0 8 0 0 1 5 3 14: 0 5 5 5 8 0 5 0 2 0 5 15: 5 0 5 3 7 5 5 0 5 5 0 5 0 5 0 5 0 0 5 0 0 0 16: 0 5 0 0 0 4 17: 0 5 5 0 0 0 18: 0 5 0 0Sept. 23, 2003 LEC #1 ECON 240A-4 L. PhillipsExploratory Data Analysis19: 0 0 5 0 020: 21: 5 Figure 2: Preliminary Leaves in the Stem and Leaf Diagram---------------------------------------------------------------------------------------------------------The last step is to order the digits composing the leaves. This provides a visualdescription of the data including the minimum, the maximum, the modes and the median.----------------------------------------------------------------------------------------------------------9: 510: 2 8 811: 0 0 2 5 5 6 6 8 8 12: 0 0 0 1 2 3 5 5 5 5 513: 0 0 0 0 0 1 3 5 5 5 6 8 8 14: 0 0 0 0 2 5 5 5 5 5 8 15: 0 0 0 0 0 0 0 0 0 0 3 5 5 5 5 5 5 5 5 5 5 716: 0 0 0 0 4 517: 0 0 0 0 5 5 18: 0 0 0 5 19: 0 0 0 0 520:21: 5Figure 3: Stem and Leaf DiagramSept. 23, 2003 LEC #1 ECON 240A-5 L. PhillipsExploratory Data AnalysisOf course this back of the envelope technology could be combined with using a computerto sort or order the data.In all there are 92 observations or data points. So the median would lie between the 46th and 47th observation, i.e. between 145 and 145 so the median is 145. Note the data is bimodal with ten 150’s and ten 155’s. The students have a reporting bias tending to round off to zeros and fives. IV. Dispersion One measure of dispersion is the interquartile range, IQR. Sort the data and put the points into four groups with equal numbers of observations. There will be two groups above the median and two groups below the median. If the median is a data point, add it to both the upper group and the lower group. In the case of the weight data, we had an even number of observations, and the median fell between two observations, the 46th and the 47th, which were both equal to 145. Next, find the median for the two high groups, i.e. the third quartile with 25 percent of the observations above it. Also find the median for the two lowest groups, i.e. the first quartile with 25 percent of the observations below it. The difference between the median for the highs and the median for the lows is the interquartile range.Having already done the work for the weight data by constructing the stem and leaf diagram, we can use it to determine the first quartile of 125 pounds, between the 23rd observation of 125 pounds and the 24th observation of 125 pounds. The third quartile is between the 23rd and 24th observation from the top, i.e. between 157 pounds and 155 pounds so the third quartile is 156 pounds, and the interquartile range is 156 minus 125 or31 pounds.Sept. 23, 2003 LEC #1 ECON 240A-6 L. PhillipsExploratory Data AnalysisJohn Tukey’s box and whiskers plot displays the interquartile range as well as other features of the data such as outliers. The left edge of the box is the first quartile and the right edge of the box is the third quartile. The median is drawn as a


View Full Document

UCSB ECON 240a - Exploratory Data Analysis

Documents in this Course
Final

Final

8 pages

power_16

power_16

64 pages

final

final

8 pages

power_16

power_16

64 pages

Power One

Power One

63 pages

midterm

midterm

6 pages

power_16

power_16

39 pages

Lab #9

Lab #9

7 pages

Power 5

Power 5

59 pages

Final

Final

13 pages

Final

Final

11 pages

Midterm

Midterm

8 pages

Movies

Movies

28 pages

power_12

power_12

53 pages

midterm

midterm

4 pages

-problems

-problems

36 pages

lecture_7

lecture_7

10 pages

final

final

5 pages

power_4

power_4

44 pages

power_15

power_15

52 pages

group_5

group_5

21 pages

power_13

power_13

31 pages

power_11

power_11

44 pages

lecture_6

lecture_6

12 pages

power_11

power_11

42 pages

lecture_8

lecture_8

11 pages

midterm

midterm

9 pages

power_17

power_17

13 pages

power_14

power_14

55 pages

Final

Final

13 pages

Power One

Power One

53 pages

Summary

Summary

54 pages

Midterm

Midterm

6 pages

Lab #7

Lab #7

5 pages

powe 14

powe 14

32 pages

Lab #7

Lab #7

5 pages

Midterm

Midterm

8 pages

Power 17

Power 17

13 pages

Midterm

Midterm

6 pages

Lab Five

Lab Five

30 pages

power_16

power_16

64 pages

power_15

power_15

52 pages

Power One

Power One

64 pages

Final

Final

14 pages

Load more
Download Exploratory Data Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exploratory Data Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exploratory Data Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?