DOC PREVIEW
UW-Madison SOC 357 - Basics of Quantitative Data Analysis

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Class 9Basics of Quantitative Data AnalysisClass Outline• Data Files• Codebooks• Univariate Analysis– Mean– Variance– Proportions• Bivariate Analysis– Crosstabulation– Group Comparison of Means– Correlation• Stata commands• GraphsData• Social science datasets usually have a rectangular structure.– Columns: variables – Rows: observations• Data storage– in statistical packages: Stata (*.dta), SPSS (*.sav), SAS, R, S-Plus, etc. – in spread sheet (*.cvs, *.xls). example– raw text data. example• Need data dictionary. example2Codebooks• Codebook is a guide for locating variables and interpreting codes in the data file. It often includes the following contents:– Explanation of variable names– Code lists (variable values)– Variable frequencies – Questionnaire– Explanation of sampling and use of weights •Examples – The General Social Survey (web)– U.S. Census 2000 codebook (PDF)– Codebooks generated by StataQuantitative Analysis• Univariate analysis involves a single variable.• Bivariate analysis involves two variables simultaneously.– Example: gender difference in admission rate.• Multivariate analysis involves more than two variables simultaneously.– Example: gender difference in admission rate by field of studies.Statistics• A statistic is a numerical quantity calculated from a sample.• Measures of central tendency– Mean and median, proportion• Measures of variability– Range, interquartile range, variance, and standard deviation• Measures of association– Correlation and Chi-square statistic3Categorical and Continuous Variables• Categorical variables have a limited number of possible values. • Continuous variables potentially have infinite number of categories.• Nominal measures: always treat them as categorical variables.• Ordinal measures: Intrinsically categorical. In practice, we sometimes treat them as continuous variables by assuming equal distance between the adjacent categories.• Interval and ratio measures: always treat them as continuous. • Use different statistics and graphs when working with categorical and continuous variables.Levels of Measurement and Choice of Statistics: An Example• Income can be measured with different levels of details.1. Nominal measure• 0: below poverty• 1: above poverty2. Ordinal measure• 1: 0~15k• 2: 16~30k• 3: 31~50k• 4: 51~100k• 5: 101k+3. Ratio measure. “Round your income to the nearest hundred”• To summarize income, we use proportions when it is measured at the nominal or ordinal level and mean and variance when it is measured at the ratio level.Univariate Statistics:Continuous Variables• Statistics to use with continuous variables – Measures of central tendency • Mean: arithmetic average• Median: 50% percentile• Proportion– Measures of variability• Range: (min, max)• Interquartile range: (25% percentile, 75% percentile)• Variance:• Standard deviation: 1)(1)(22−−=−−=∑∑nXXSDnXXVarii4Mean vs. Median• Mean and median are different when the distribution of the variable is skewed. For example, depression scores are right skewed. In this case, mean is greater than the median. Income is another example of right skewed distributions.• When the distribution is symmetric, mean and median are the same.0 .05 .1 .15 .2Fraction0 50 100 150Total Score of CES-DData source: the Wisconsin Longitudinal StudyStata Commands and Graphs• Use “summarize varname” to find the mean and variance of a continuous variable.• Use a boxplot or histogram to display the distribution of a continuous variable.• ExamplesUnivariate Statistics:Categorical Variables• Statistics to use with categorical variables:– Proportions– When the categorical variable has only 2 categories, which are coded 0 and 1, we can calculate the mean to find the proportions of cases in category 1.5Stata Commands and Graphs• Use “tabulate varname” to find the frequencies and proportions of a categorical variable.• Use a pie chart or bar chart to display the distribution of a categorical variable.• ExamplesBivariate Analysis• Analysis involving two variables simultaneously.•Example:– Gender– Attitudes toward premarital sex• Choose the appropriate bivariate analysis:CorrelationGroup comparison of meansContinuousGroup comparison of meansCross tabulationCategoricalContinuousCategoricalCrosstabulations• It is customary to put the independent variable as the row variable and the dependent variable as the column variable. If the table is set up like this, calculate the row percentages, not the column percentages.Key:frequency row percentage column percentageRESPONDENT | IS PREMARITAL SEX WRONG?S SEX | ALWAYS WR ALMOST AL WRONG ONL NOT WRONG | Total-----------+--------------------------------------------+----------MALE | 374 153 329 723 | 1,579 | 23.69 9.69 20.84 45.79 | 100.00| 35.18 35.75 45.82 48.23 | 42.58 -----------+--------------------------------------------+----------FEMALE | 689 275 389 776 | 2,129 | 32.36 12.92 18.27 36.45 | 100.00 | 64.82 64.25 54.18 51.77 | 57.42 -----------+--------------------------------------------+----------Total | 1,063 428 718 1,499 | 3,708 | 28.67 11.54 19.36 40.43 | 100.00 | 100.00 100.00 100.00 100.00 | 100.006Group Comparison of Means • Example: sex difference in years of schooling• Sometimes it is useful to “collapse” (i.e., combine) categories. >= 25000>= 2500020000~2499920000~2499915000~1999915000~1999910000~1490010000~149999000~99998000~89997000~79996000~69995000~99995000~59994000~49993000~39991000~2999<5000<= 999New codes“Recode”Original codesGroup Comparison of Means• Bar Charts– graph bar (mean) educ, over(income)– graph bar (mean) educ, over(newincome)0 5 10 15mean of educ<5k <10 k <15k < 20k <25k >2 5k ref used0 5 10 15mean of educLT $1000$1000 TO 2999$3000 TO 3999$ 4000 TO 4999$5000 T O 5999$600 0 TO 6999$7000 TO 7 999$8000 TO 9999$10000 - 14999$15000 - 19999$20000 - 24999$25000 OR MOREREFUSEDScatter Plots and Correlations. graph matrix popgrowth lexp gnppc. correlate popgrowth lexp gnppc(obs=63)| popgro~h lexp gnppc-------------+---------------------------popgrowth | 1.0000lexp | -0.4215 1.0000gnppc | -0.3580 0.7182


View Full Document

UW-Madison SOC 357 - Basics of Quantitative Data Analysis

Documents in this Course
Syllabus

Syllabus

12 pages

Sampling

Sampling

35 pages

Class 7

Class 7

6 pages

Review

Review

3 pages

Load more
Download Basics of Quantitative Data Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Basics of Quantitative Data Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Basics of Quantitative Data Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?