DOC PREVIEW
UW-Madison ECON 310 - EconStats310 - September 5

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1 Econ 310 Professor Wallace September 5, 2012 Lecture: 1. Data taxonomy 2. Descriptive methods for nominal and ordinal data a. Frequency Distributions b. Bar Graphs (Discrete Histograms) c. Pie Charts 3. Introduction to STATA a. Downloading and opening a STATA data sets b. Default windows c. Basic data commands (describe, edit, browse, summarize, tabulate, edit, browse, generate, replace, save, clear) d. *.do files e. Getting help 4. Descriptive methods for nominal and ordinal data using STATA a. Frequency distributions b. Bar graphs (discrete histograms) c. Pie charts By the end of class you should be able to: 1. Differentiate between nominal, ordinal, and interval data; 2. Have the capacity to download and open a STATA data set 3. Have the capacity to do simple data manipulations in STATA 4. Have the capacity implement descriptive methods for nominal and ordinal data using STATA (frequency distributions, discrete histograms, and pie charts).2 Some Definitions - Variable – some characteristic of the population Example: In the last lecture I referenced weekly earnings. Weekly earnings is an example of a variable. - Values of a variable – the possible values that a variable can take on Example: Weekly earnings can take on any value from 0 to a very large, but undetermined number. - Data – the observed values of a variable Example 1: In most survey data sources weekly earnings are top coded to as to ensure the privacy of respondents. For example, in the wage data from Current Population Survey Outgoing Rotation Groups (CPS ORG) that we are going to be looking at today weekly earnings are top coded at an inflation unadjusted $999. Thus, real weekly earnings data will fall between $0 and $999. Example 2: If we define the variable weekly earnings for the population of workers we can exclude zero values for the weekly earnings variable. In this case real weekly earnings data will fall be greater than 0 and less than $999 - Element – the entities on which the data are collected. Types of Data - Interval data (quantitative or numerical data) – data exist as real numbers (e.g., height, weight, earnings, wealth, IQ score). - Categorical data – data signifies a category (e.g., race, sex, region, educational attainment) o Nominal data – there is no natural ordering of the categories (e.g., sex, race, and region) o Ordinal data – the order of the categories has meaning (e.g., educational attainment, self-reported health).3 Data set – a collection of related data – data sets may contain interval, nominal, and ordinal data. Types of Data Sets - Cross section data set – a collection of related data that is gathered across a number of elements at a particular point in time. - Repeated (or Pooled) cross section data set – a collection of related data constructed from multiple cross section data sets (e.g., the CPS ORG data set we will be working with is repeated cross section data set as it is constructed from cross sections data sets from two years). - Longitudinal data set – a collected of related data that are constructed from the same elements at different points in time. Example: The National Longitudinal Survey of Youth 1979 (NLSY79) consist various data on the same people (elements) from the time they were in their teens in 1979 through adulthood. - Time series data set – a collection of related data that are constructed from one element at multiple points in time. Usually the time series data sets contain data that data is aggregated up to some level (e.g., national, state, ect).4 Descriptive Techniques for Categorical Data - Frequency distribution – a tabular description for nominal data that list the number of units associated with each category - Relative frequency distribution – tabular description for nominal data that list the fraction or percentage of units associated with each category - Cumulative frequency distribution (ordinal only) – a tabular description for nominal data that list the cumulative (category and below) count, fraction, or percentage of units associated with each category Example: The following table contains the frequency distribution, relative frequency distribution, and cumulative frequency distribution for educational attainment data from the CPS ORG: Educational Attainment | Level | Freq. Percent Cum. ---------------------------+----------------------------------- High school dropout | 721 23.29 23.29 High school | 1,212 39.15 62.44 Some college | 564 18.22 80.65 4 or more years of college | 599 19.35 100.00 ---------------------------+----------------------------------- Total | 3,096 100.005 - Discrete histogram (a type of bar graph) – a graphical representation of a frequency distribution or relative frequency distribution whereby bars are associated with categories and the height of each bar on the graph represents the frequencies or relative frequencies associated with its corresponding category Example: The same data described using a discrete relative histogram. Distribution of Educational Attainment (male full-time, full-year workers in 1979) 010203040PercentHigh school dropoutHigh schoolSome college4 or more years of college6 - Pie chart – a graphical representation of the relative frequency distribution whereby a circle (or pie) is divided into slices with each slice representing a category and where the size of the slice is proportional to the relative frequency of its associated category. Example: The same relative frequency distribution of educational attainment displayed in pie chart format. Distribution of Educational Attainment (male full-time, full-year workers in 1979) High school dropoutHigh schoolSome college4 or more years of college23.29%39.15%18.22%19.35%7 - Deciding between frequency distributions, discrete histograms, and pie charts – there are tradeoffs. o Graphs and charts verses tables  Graphs and charts take up more room that the same information displayed in tabular format, but they may be easier for some audiences to interpret.  Too many graphs or charts is generally a bad idea.  Charts may be better when providing descriptive statistics related to


View Full Document

UW-Madison ECON 310 - EconStats310 - September 5

Documents in this Course
week9a

week9a

5 pages

week8a

week8a

6 pages

week7a

week7a

12 pages

Load more
Download EconStats310 - September 5
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view EconStats310 - September 5 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view EconStats310 - September 5 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?