UCLA STAT 11 - ch2 - D522852

Home> Schools> University of California, Los Angeles> (STAT) > STAT 11> ch2

UCLA STAT 11 - ch2

School name University of California, Los Angeles

Course Stat 11- Introduction to Statistical Methods for Business and Economics

Pages 13

Download Save

Unformatted text preview:

Stats 11 (Fall 2004) Lecture Note Instructor: Hongquan XuIntroduction to Statistical Methods for Business and EconomicsChapter 2: Tools for Exploring Univariate DataSection 2.1: IntroductionWhat is Data?• Data are numerical facts.• A set of data contains information about individuals.• Information is organized in variables.• A variable is a prop e rty of an individual (e.g., age, gender, ...).Definitions• Individuals are the objects described by a set of data. Individuals may be people, but they may alsobe animals or things.• A variable is any characteristic of an individual. A variable can take different values for differentindividuals.Next we want to distinguish between the different types of variables - different types of variables providedifferent kinds of information and the type will guide what kinds of summaries (graphs/numerical) areappropriate.Think about it:• Could I compute the “AVERAGE AGE” for all UCLA students?• Could I compute the “AVERAGE GENDER” for all UCLA students?Gender is said to be a variable,Age is a variable.Types of Variables• Quantitat ive variables are measurements and counts.– Variables with few repeated values are treated as continuous.– Variables with many repeated values are treated as discrete.• Qualitative variables describ e group membership.1– Categorical variables are– Ordinal variables are(See Figure 2.1.1 on page 42.)Qualitative or Quantitative?• Hair Color:• Salary:• Exam grade (0–100):• Letter grade (A, B, C, etc.):• Weight:• Color preference (1=Red, 2=Blue, 3=Green):Section 2.2: Presentation of DataSome useful roles for presenting data efficiently.• Make it simple for presentation, e.g., rounding numbers.• Maintain complete accuracy in calculation to avoid rounding errors.• Keep full detail for reference.Section 2.3: Simple Plots for Continuous Variables• Dot plots• Stem-and-leaf plots• HistogramsA good picture is worth more than a thousand words!What is distribution?• The distribution of a variable tells us what values it takes and how often it take s these values.Dot plots plot a batch of numb ers on a scale. Good for small to moderate batches of data.Example: make a dot plot of the 15 numbers:54 59 35 41 46 25 47 60 54 46 49 46 41 34 222Stem-and-leaf plots (also called stem plots) show a quick picture of the shape of a distribution. Goodfor plotting 15–150 data points.Example: Make a stem plot of the 15 numbers:54 59 35 41 46 25 47 60 54 46 49 46 41 34 22Separate each observation into a stem and a leaf. A leaf often is a single digit.For example, for the first observation, 54, the stem is , and the leaf is .Procedure: (a) Write the stems (b) Write the leaves (c) Sort the leaves (d) Write the units.Comments on Stemplot:• It is possible to re-construct the original data set.• It works well for small numbers of observations, but not so well for large data sets.• Software can make complicated stem-and-leave plots.• A histogram does a similar job and is preferred for large data sets.Histogram is a good graph for quantitative variables with a large number of observations.Example: The data show the money (in dollars) that 50 shoppers spe nt at a supermarket. The data aresorted.3.11, 8.88, 9.26, 10.81, 12.69, 13.78, 15.23, 15.62, 17.00, 17.39,18.36, 18.43, 19.27, 19.50, 19.54, 20.16, 20.59, 22.22, 23.04, 24.47,24.58, 25.13, 26.24, 26.26, 27.65, 28.06, 28.08, 28.38, 32.03, 34.98,36.37, 38.64, 39.16, 41.02, 42.97, 44.08, 44.67, 45.40, 46.69, 48.65,50.39, 52.75, 54.80, 59.07, 61.22, 70.32, 82.70, 85.76, 86.37, 93.34.To make a histogram of the distribution, proceed as follows:1. Find the range of the values: min= , max=32. Divide the range of the data into class es (i.e., intervals) of equal width. What is reasonable here?classes: .3. Count the number of observations in each class. These counts are called frequencies.A Frequency TableFrequencyClass (Count) Relative Frequency Percent[0, 10) 3 3/50 = .06 6[10, 20) 12 12/50 = .24 24[20, 30) 13 .26 26[30, 40) 5 .10 10[40, 50) 7 .14 14[50, 60) 4 .08 8[60, 70) 1 .02 2[70, 80) 1 .02 2[80, 90)[90, 100) 1 .02 2Total 50 1.00 100Note: Relative Frequency = Frequency/Total number of observations.4. Draw the histogram.Stata HistogramComments on Histogram:• The vertical axis can be frequency (count) or relative frequency or percent.• Each bar represents a class, the base of the bar covers the class.• No horizontal space between bars.• Be careful about the boundaries: inclusive or exclusive?• How many classes? Use your judgment• Software has defaults. Stata uses 5 classes by default.4Interpreting stem- and-leaf plots and histograms• Look for the overall pattern and for striking deviations from that pattern.• Describe the overall pattern of a distribution by its shape, center, and spread.• A striking deviation is an outlier, an individual value that falls outside the overall pattern.We will learn how to describe center and spread numerically in Section 2.4.Some things to look for in describing shape are:• Does the distribution have one or several major peaks, called modes? A distribution with one majorpeak is called unimodal.• Is it approximately symmetric or is it skewed in one direction? A distribution is positively skewedif the right tail (larger values) is much longer than the left tail (smaller values).Now, how to interpret the histogram of the money spe nt by 50 shoppers?• Look for the overall pattern:• Look for striking deviations from the overall pattern:Example: Histogram of Simon Newcomb’s 66 measurements of the passage time of light. The values arethe deviations from 24,800 nanoseconds (a nanosecond = 10−9seconds).• Look for the overall pattern:• Look for striking deviations from the overall pattern:5Section 2.4: Numerical Summaries for Continuous VariablesIn this section we will focus on numerical summaries of the center and the spread of the distribution (ap-propriate for quantitative data only!).Two Measures of Center• sample mean ¯x — the average value• sample median M or Med — the middle valueMeasuring Center: The Sample MeanNotation• n= sample size (i.e., # of observations)• observations: x1, x2, ..., xnThe sample mean ¯x is the average value.¯x =x1+ x2+ ··· + xnnor, in more compact notation,¯x =1nXxiExample: Golf scores of 12 members of a women’s golf team in tournament play.89 90 87 95 86 81 102 97 83 88 91 79The mean isMeasuring Center: The Sample MedianThe sample

View Full Document


School:
Email:
New Password:
Confirm Password:

UCLA STAT 11 - ch2

Sign up for free to view:

Please select your school