Introduction to Descriptive StatisticsPopulation vs. Sample NotationTypes of VariablesDescribing dataMeanVariance, Standard DeviationVariance, S.D. of a SampleCoefficient of variationSkewness Symmetrical distributionSkewness Asymmetrical distributionSkewness (Asymmetrical distribution)SkewnessSlide 13KurtosisA few words about the normal curveMore words about the normal curveSEG exampleGraph some SEG variablesBinary dataCommands in STAT for getting univariate statisticsExplore Q9: Overall teaching evaluationGraph Q9Divide into 7 “bins” and have them span 1, 1..2, 2..3, … 6..7Add ticks at each integer scoreAdd a finer grain to the barsEven finer grainSuperimpose the normal curve (with the same mean and s.d. as the empirical distribution)Do the previous graph with only larger classes (n > 20)Draw the previous graph with a box plotDraw the box plots for small (0..20), medium (21..50), and large (50+) classesA note about histograms with unnatural categoriesSimple graphSolution, Step 1 Map artificial category onto “natural” midpointGraph of recoded dataDensity plot of dataIntroduction to Descriptive Statistics2/25/03Population vs. Sample NotationPopulation Vs SampleGreeks Romans, , s, bTypes of VariablesNominal(Qualitative)~Nominal(Quantitative)OrdinalInterval orratioDescribing dataMoment Non-mean based measureCenter Mean Mode, medianSpread Variance (standard deviation)Range,Interquartile rangeSkew Skewness --Peaked Kurtosis --MeanXnxnii1Variance, Standard Deviationniiniinxnx12212)(,)(Variance, S.D. of a Samplesnxsnxniinii122121)(,1)(Coefficient of variation100.. vcSkewnessSymmetrical distribution•IQ•SATValueFrequencySkewnessAsymmetrical distribution•GPA of MIT studentsValueFrequencySkewness(Asymmetrical distribution)•Income•Contribution to candidates•Populations of countries•“Residual vote” ratesValueFrequencySkewness smedianmeansemea nnnxxxxniinii/)(3/)mod(2)1(2/312/313SkewnessValueFrequencyKurtosisValueFrequencyk > 3k = 3k < 3A few words about the normal curve•Skewness = 0•Kurtosis = 3ValueFrequency22/)(21)(xexfMore words about the normal curveValueFrequency34%34%x47.7% 47.7%SEG exampleThe instructor and/or section leader:Mean s.d. Skew Kurt GraphGives well-prepared, relevant presentations6.0 0.69 -1.7 8.5Explains clearly and answers questions well5.9 0.68 -1.0 4.8Uses visual aids well 5.6 0.85 -1.8 8.9Uses information technology effectively 5.5 0.91 -1.1 5.0Speaks well 6.1 0.69 -1.5 6.8Encourages questions & class participation 6.1 0.66 -0.88 3.7Stimulates interest in the subject 5.9 0.76 -1.1 4.7Is available outside of class for questions 5.9 0.68 -1.3 6.3Overall rating of teaching 5.9 0.67 -1.2 5.5Graph some SEG variablesThe instructor and/or section leader:Mean s.d. Skew Kurt GraphUses visual aids well 5.6 0.85 -1.8 8.9Encourages questions & class participation6.1 0.66 -0.88 3.7Fraction(mean) q31 70.6Fraction(mean) q61 70.6Binary data)1()1(1 timeof proportion1)(2xxsxxsxXprobXxxCommands in STAT for getting univariate statistics•summarize•summarize, detail•graph, bin() normal•graph, box•tabulate [NB: compare to table]Explore Q9: Overall teaching evaluationsubject q9 n3.371 6.4375 163.982 6.73333 153.14 6.46154 1314.02D 5.66667 321W.803 5.66667 1221M.480 5.69231 1317.906 5.28571 142.51 5.88235 17Graph Q9. graph q9Fraction(mean) q92.33333 70.505495Divide into 7 “bins” and have them span 1, 1..2, 2..3, … 6..7. graph q9,bin(7) xscale(0,7)Fraction(mean) q90 70.57326Add ticks at each integer score. graph q9,bin(7) xscale(0,7) xlabel(0,1,2,3,4,5,6,7)Fraction(mean) q90 1 2 3 4 5 6 70.57326Add a finer grain to the bars. graph q9,bin(14) xscale(0,7) xlabel(0,1,2,3,4,5,6,7)Fraction(mean) q90 1 2 3 4 5 6 70.318681Even finer grain•. graph q9,bin(28) xscale(0,7) xlabel(0,1,2,3,4,5,6,7)Fraction(mean) q90 1 2 3 4 5 6 70.181319Superimpose the normal curve (with the same mean and s.d. as the empirical distribution). graph q9,bin(28) xscale(0,7) xlabel(0,1,2,3,4,5,6,7) normFraction(mean) q90 1 2 3 4 5 6 70.181319Do the previous graph with only larger classes (n > 20). graph q9 if n>20,bin(28) xscale(0,7) xlabel(0,1,2,3,4,5,6,7) Fraction(mean) q90 1 2 3 4 5 6 70.202532Draw the previous graph with a box plot. graph q9 if n>20,box ylabel34567 (mean) q9Draw the box plots for small (0..20), medium (21..50), and large (50+) classes. gen size = 0 if n <=20(237 missing values generated). replace size=1 if n > 20 & n <=100(196 real changes made). replace size = 2 if n > 100(41 real changes made). sort size. graph q9 ,box ylabel by(size). graph q9 ,box ylabel by(size) 2468 (mean) q90 1 2A note about histograms with unnatural categoriesFrom the Current Population Survey (2000), Voter and Registration SurveyHow long (have you/has name) lived at this address? -9 No Response-3 Refused-2 Don't know-1 Not in universe1 Less than 1 month2 1-6 months3 7-11 months4 1-2 years5 3-4 years6 5 years or longerSimple graphFractionPES81 60.557134Solution, Step 1Map artificial category onto “natural” midpoint-9 No Response missing-3 Refused missing-2 Don't know missing-1 Not in universe missing1 Less than 1 month 1/24 = 0.0422 1-6 months 3.5/12 = 0.293 7-11 months 9/12 = 0.754 1-2 years 1.55 3-4 years 3.56 5 years or longer 10 (arbitrary)Graph of recoded dataFractionlongevity0 1 2 3 4 5 6 7 8 9 100.557134Density plot of datalongevity0 1 2 3 4 5 6 7 8 9
View Full Document