Unformatted text preview:

1Introduction to Descriptive Statistics17.871Types of VariablesNominal(Qualitative)“categorical”~Nominal(Quantitative)OrdinalInterval orratioKey measuresDescribing data--KurtosisPeaked--SkewnessSkewRange,Interquartile rangeVariance (standard deviation)SpreadMode, medianMeanCenterNon-mean based measureMomentKey distinctionPopulation vs. Sample Notations, bμ, σ, βRomansGreeksSamplevs.PopulationMeanXnxnii≡≡∑=μ1Variance, Standard Deviationσμσμ≡−≡−∑∑==niiniinxnx12212)(,)(2Variance, S.D. of a Samplesnxsnxniinii≡−−≡−−∑∑==122121)(,1)(μμDegrees of freedomBinary data)1()1(1 timeof proportion1)(2xxsxxsxXprobXxx−=⇒−=====Normal distribution example IQ SAT Height “No skew” “Zero skew” Symmetrical Mean = median = mode ValueFrequency22/)(21)(σμπσ−−=xexfSkewnessAsymmetrical distribution GPA of MIT students “Negative skew” “Left skew”ValueFrequencySkewness(Asymmetrical distribution) Income Contribution to candidates Populations of countries “Residual vote” rates “Positive skew” “Right skew”ValueFrequencySkewnessValueFrequency3KurtosisValueFrequencyk > 3k = 3k < 3leptokurticplatykurticmesokurticNormal distribution Skewness = 0 Kurtosis = 322/)(21)(σμπσ−−=xexfMore words about the normal curveThe z-scoreor the“standardized score”zxxx=−σCommands in STATA for getting univariate statistics summarize varname summarize varname, detail histogram varname, bin() start() width() density/fraction/frequency normal graph box varnames tabulate [NB: compare to table]Example of Sophomore Test Scores High School and Beyond, 1980: A Longitudinal Survey of Students in the United States (ICPSR Study 7896) totalscore = % of questions answered correctly minus penalty for guessing recodedtype = (1=public school, 2=religious private, 3 = non-sectarian private)4Explore totalscore some more. table recodedtype,c(mean totalscore)--------------------------recodedty |pe | mean(totals~e)----------+---------------1 | .37297352 | .44755483 | .589883--------------------------Graph totalscore. hist totalscore0 .5 1 1.5 2Density-.5 0 .5 1totalscoreDivide into “bins” so that each bar represents 1% correct hist totalscore,width(.01) (bin=124, start=-.24209334, width=.01)0 .5 1 1.5 2Density-.5 0 .5 1totalscoreAdd ticks at each 10% markhistogram totalscore, width(.01) xlabel(-.2 (.1) 1)(bin=124, start=-.24209334, width=.01)0 .5 1 1.5 2Density-.2 -.1 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1totalscoreSuperimpose the normal curve (with the same mean and s.d. as the empirical distribution). histogram totalscore, width(.01) xlabel(-.2 (.1) 1) normal(bin=124, start=-.24209334, width=.01)0 .5 1 1.5 2Density-.2 -.1 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1totalscoreHistograms by category.histogram totalscore, width(.01) xlabel(-.2 (.1)1) by(recodedtype)(bin=124, start=-.24209334, width=.01)0 1 2 30 1 2 3-.2 -.1 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1-.2 -.1 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 11 23DensitytotalscoreGraphs by recodedtypePublic Religious privateNonsectarian private5Main issues with histograms Proper level of aggregation Non-regular data categoriesA note about histograms with unnatural categoriesFrom the Current Population Survey (2000), Voter and Registration SurveyHow long (have you/has name) lived at this address? -9 No Response-3 Refused-2 Don't know-1 Not in universe1 Less than 1 month2 1-6 months3 7-11 months4 1-2 years5 3-4 years6 5 years or longerSolution, Step 1Map artificial category onto “natural” midpoint-9 No Response Æ missing-3 Refused Æ missing-2 Don't know Æ missing-1 Not in universe Æ missing1 Less than 1 month Æ 1/24 = 0.0422 1-6 months Æ 3.5/12 = 0.293 7-11 months Æ 9/12 = 0.754 1-2 years Æ 1.55 3-4 years Æ 3.56 5 years or longer Æ 10 (arbitrary)Graph of recoded dataFractionlongevity01 2 3 4 5 6 7 8 9 100.557134histogram longevity, fractionlongevity01 2 3 4 5 6 7 8 9 10015Density plot of dataTotal area of last bar = .557Width of bar = 11 (arbitrary)Solve for: a = w h (or).557 = 11h => h = .051Density plot template.0511154.55715+ yr..07242.14043-4 yr..15121.15291-2 yr..09.5001½.04307-11 mo..22.417½1/12.09091-6 mo..19*.0821/120.0156< 1 mo.Height (density)X-lengthX-maxX-minFractionCategory* = .0156/.0826Draw the previous graph with a box plot. graph box totalscore-.5 0 .5 1Upper quartileMedianLower quartile}Inter-quartilerange}1.5 x IQRDraw the box plots for the different types of schools. graph box totalscore, by(recodedtype)-.5 0 .5 1-.5 0 .5 11 23Graphs by recodedtypeDraw the box plots for the different types of schools using “over” option-.5 0 .5 1123graph box totalscore, over(recodedtype)Three words about pie charts: don’t use them So, what’s wrong with them For non-time series data, hard to get a comparison among groups; the eye is very bad in judging relative size of circle slices For time series, data, hard to grasp cross-time comparisonsSome words about graphical presentation Aspects of graphical integrity (following Edward Tufte, Visual Display of Quantitative Information) Represent number in direct proportion to numerical quantities presented Write clear labels on the graph Show data variation, not design variation Deflate and standardize money in time


View Full Document

MIT 17 871 - Study Notes

Documents in this Course
Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?