MIT 9 07 - Graphs, and measures of central tendency and spread

Unformatted text preview:

Fraction of total samplesin each bin.binsGraphs, and measures of central tendency and spread 9.07 9/13/2004 Histogram bars don’t touch. If continuous, can touch, bins. frequency”) plot Alt: for lots of bins, continuous variable, draw If discrete or categorical, should if there are lots of Sum of bin heights = N Alternative: density (or “relative Sum of bin heights = 1 histogram as a “frequency polygon” 1Don’t do this for categorical variables Stem-and-Leaf Plots • • • • 0755 0557 10100554 10001455 2050 ---> 2005 355 355 40 40 A quick way of examining the distribution of data Like histograms on their sides, but with more information. Data: 7 5 5 10 11 10 10 15 15 14 20 25 20 35 35 40 Stem-and-Leaf Plot: We can plot two histograms in the same plot, to compare them • • You get lots of distribution, but it’s kind of visually noisy. • • E.G. distribution of heights for women (solid) vs. men (dashed) Summarizing the distribution You might not want to look at the whole distribution in that last case. information about the shape of the Summarize. Central tendency intuition: capture the impression that the heights for the men tend to be higher than the heights for the women. 2Measures of central tendency • – – data, but typically only data, where the other measures we’ll discuss don’t apply. Data can have one or more modes • • Mode Where’s the peak of the histogram? Can be used for any used for categorical Unimodal vs. bimodal vs. trimodal Here a mode is the highest point locally Average • 1,x2,x3,…,xN} • mean. But note this is the sample mean. population mean. • ∑ = = ++++ = N i iN xNN xxxx x 1 321 1... Notation: observations x={xAlso known as the Soon we will talk about the Don’t get confused! Only makes sense for interval or ratio data, but will often see it used for rank-order or rating data, too 3Highly skewed data • • • • mode average some-where around hereNegative skew. Average not at the mode. Note < 50% of data is below average Why does the average shift to the right? Average as center of mass • The average gives the point where a histogram (if made of blocks) would balance. 4 0Nov 2Week beginningPositive or Negative Skew?% chanceEarlier Nov 9Nov 16 Nov 23 Nov 30 Later5101520253035Figure by MIT OCW.1001940 1950 1960YearMean number students studying Brain and Cognitive Sciences, per university1970 1980 1990120140160180200220Figure by MIT OCW.What an outlier does to the mean • right. Another measure: the median • the left, ½ to the right. • 1, 2, 4.7, 6, 8, 9.2, 10 → median=6 • 8, 15.2, 18, 19.2, 21.3, 25 → median=(18+19.2)/2 =18.6 Outlier shifts balance to right, average to Median = value with ½ of the data points to E.G. E.G. The median is robust to outliers • 3, 5, 7, 7, 140 • • When to use which measure? • – – – – – – The median lies at # of hours of TV watched per week: Mean = 44.4! Median = 7. Mode Variables are categorical. You want a quick and easy measure for ordinal/quantitative data. You want to report the most common score •Median Variables are measured at the ordinal level. Variables measured at the interval-ratio level have highly skewed distributions or lots of outliers You want to report the central score. the exact center of the distribution 5When to use which measure? – distributions). • – skewed distributions). – The notion of “spread” • • •Mean Quantitative variables (except for highly skewed Income distributions are highly skewed – be wary of anyone talking about the “average tax cut”. They should be using the median. You want to report the typical score (except for highly The mean is the "fulcrum that exactly balances all of the scores.” You anticipate additional statistical analysis Plots look pretty different even when there’s the same central tendencies. Also need to capture a notion of spread. Measures of spread • • • Range • observed. • distribution. Range Interquartile range Standard deviation Biggest value observed – smallest value Pretty sensitive to outliers in the tails of the 6• • median median=Q3median=Q1 Box-and-whiskers plot • Q3 • box. • outliers = any points > the box. • are not outliers. Interquartile range Divide data into 4 groups, see how far about the extreme groups are. Q3-Q1 = IQR. Draw box ends = Q1 & Draw median through Plot as points the 1.5 IQR from an end of Plot “whiskers” out to the farthest points that Standard deviation • • • • 0 )( 1 =−=−= − ∑∑∑ = NxN x N x N xx i N i i Try squaring the difference, before • 2 2 1 2 2 )( xN x N xx s i N i i −= − = ∑∑ = Measures spread about average. Desired: How far are data points from the average, on average? But, Clearly, this is not going to be useful… xN taking the average Sample variance = 7 5Luxury SedanTruckMiniVanCompactSub-CompactEconomy Standard Deluxe Ultra10 15Car Prices ($K)20 25 30 35 40Figure by MIT OCW.But, this doesn’t quite do what we want • 2. • the mean, we’d like the spread to double. Sample standard deviation • N xx s i∑ − = 2)( Sometimes you’ll see “N-1” We’ll talk about this later. Wrong units – If we were talking about height in inches, spread is now in inchesIf we double the distance from all points to Variance will go up by a factor of 4. This has the right units and right behavior. Example • • • • 2) = sqrt(60.2-49) = 3.35 • 3, 5, 5, 7, 9 → 6, 8, 8, 10, 12 (+3) → (+3) s = 2.04 → s = 2.04 • 18, 24, 12, 6 → (÷12) → (÷12) s = 6.71 → s = .56 (÷12) Data: 3, 5, 7, 7, 13 Average = 7 Deviations: -4, -2, 0, 0, 6 Root mean square deviation: s = sqrt((16+4+0+0+36)/5) = 3.35 •Or: s = sqrt(mean(9, 25, 49, 49, 169) – 7Effect of data transformations on mean and standard deviation Effect of adding a constant to each datum: mean = 5.8 mean = 8.8 (same) Effect of multiplying by a constant 1.5, 2, 1, 0.5 mean = 15 mean = 1.25 8Effect of transformation on mean and standard deviation • unchanged. • the mean and standard deviation by that constant. changes • → x = 1.5, 2, 1, 0.5 • → z = .4, 1.3, -.4, -1.3 • class… s xx z i i )( − = How many standard deviations above the mean is score xi? Adding a constant shifts the mean by that amount, and leaves the standard deviation Multiplying by a constant multiplies both The z-score is robust to these x = 18, 24, 12, 6 z = .4, 1.3,


View Full Document

MIT 9 07 - Graphs, and measures of central tendency and spread

Download Graphs, and measures of central tendency and spread
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Graphs, and measures of central tendency and spread and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Graphs, and measures of central tendency and spread 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?