DOC PREVIEW
UCLA STATS 10 - Ch04

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 10, UCLA, Ivo Dinov1UCLA STAT 10Introduction toStatistical Reasoning Instructor: Ivo Dinov, Asst. Prof. inStatistics and Neurology Teaching Assistants: , Yan Xiong and Will AndersonUCLA StatisticsUniversity of California, Los Angeles, Winter 2002http://www.stat.ucla.edu/~dinov/Stat 10, UCLA, Ivo Dinov2Chapter 4Numerical Summaries – Mean and Standard DeviationStat 10, UCLA, Ivo Dinov3Data representations  The histogram of observed data summarizes a large amount of information describing the process we have observed. Often more concise representations are needed.  Measures of central tendency – average, median, mode. Measures of variability – Standard deviation (standard error, root-mean-square), range and quartile and inter-quartile range Inter-quartile range Energy of the data (sum-squared) Etc.Stat 10, UCLA, Ivo Dinov4The average  If we have to summarize a histogram, or any bar-plot for that matter, in only a few words what would these be?Stat 10, UCLA, Ivo Dinov5The average  The average of a list of numbers is their sum divided by how many there are. Example: {9, 1, 2, 2, 0}, – Average = (9+1+2+2+0)/5 = 14/5 = 2.8 In general, {a1, a2, a3, …, aN},– Average = (a1+a2+a3+…+aN)/N. Stat 10, UCLA, Ivo Dinov6Cross-sectional vs. Longitudinal Studies The avg. height of men appears to decrease with age. Should we conclude the avg. person’s getting shorter with time? No, because this is a cross-sectional study– different subjects are compared to each other at one point in time. In longitudinal studies – subjects/units are followed over time and compared with themselves. Note that the people on the 20-30 yrs range are completely different from the folks in the 60-70 yrs of age. There’s evidence that with time men may be getting taller – an effect which is heavily confound with the effects of aging.596061626364656667686920 yrs3040506070MenWomenStat 10, UCLA, Ivo Dinov7Average vs. Median Avg. weight for women 146 lb. Should we expect 50% below and 50% above the average? No, in fact 41% are above and 59% are below the avg. The histogram balances when supported on the average. The median of a histogram is the value in the middle with 50% of the observations above and 50% below the median.02040608010012014016018020 yrs3040506070MenWomenMed = xPMedxP(a) Data symmetric about P(b) Two largest points moved to the rightMeanStat 10, UCLA, Ivo Dinov8Root Mean Square (R.M.S.)  Consider {0, 5, -8, 7, -3}, the mean is: 0.2. But it’s also the mean of {0.1, 0.3, 0, 0.4, 0.2}. Of course, the 2 sequences of 5 numbers are very very different(e.g., size, sign, integer vs. double, etc.) So, the mean does not really representall the info about the data! R.M.S. ({a1, a2, a3, …, an}) is: Example R.M.S.{0, 5, -8, 7, -3} = 5.4, where as R.M.S.{0.1, 0.3, 0, 0.4, 0.2} = 0.24494897.∑==NkkaNSMR121...Stat 10, UCLA, Ivo Dinov9Standard Deviation (SD) The standard deviation is a measure of the spread of the data around its average. Most numbers in the data will be within 1 SD away from the average, and very few will be 2 SD’s, or more, away from the average. With the women’s height example we saw, 6,566 women ages 18-74 were surveyed, avg. height was 63.5 in and the SD was 2.5 in.  Rule of thumb for data spreading: Roughly 68% of all numbers from a list are within 1 SD of the average, and the other ~32% will be farther away. About 95% of the values will be within 2 SD’s away from the average.Normal Generation Movie, QuincunxStat 10, UCLA, Ivo Dinov10 SD = (almost) R.M.S. deviation from the average. Let {a1, a2, a3, …, aN} are the observed values, then: Where the average (mean) Example, {20, 10, 15, 15},Calculating the Standard Deviation∑=−−=NkkaNNaaaSD12)(11,...,2,1})({µ∑==NkkaN11µ15)15151020(41=+++=µ1.4350)2525(312)1515(2)1515(2)1510(2)1520(141][ ≈=+=−+−+−+−−=SDStat 10, UCLA, Ivo Dinov11 SD = (almost) R.M.S. deviation from the average. Let {a1, a2, …, aN} are the observed values, then:Calculating the Standard Deviation∑=−−=NkkaNNaaaSD12)(11,...,2,1})({µ∑==NkkaN11µNote the difference betweenOur and the textbook definitionof SD, see Ch. 26.∑=−=NkkaNNaaaSD12)(1,...,2,1})({/\µStat 10, UCLA, Ivo Dinov12Be careful in computing various data descriptors FoundedAreaAltitudePopulationWelcome toMEANSTOWN186720584372711AverageBeware of inappropriate averagingStat 10, UCLA, Ivo Dinov13Inter-quartile Range (IQR)We talked about this earlierAt the end of Ch. 01 Chapter 5Stat 10, UCLA, Ivo Dinov14The first quartile (Q1) is the median of all the observations whose position is strictly below the position of the median, and the third quartile (Q3) is the median of those above.Quartiles25%25%25%medianStat 10, UCLA, Ivo Dinov15The five-number summery = (Min, Q1, Med, Q3, Max)Five number summaryStat 10, UCLA, Ivo Dinov16IQR = Q3-Q1Inter-quartile RangeStat 10, UCLA, Ivo Dinov17SYSVOL50 100 150 200MedianQ1Q3Box plotDot plotFigure 2.4.3 Box plot for SYSVOL.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.Box plot compared to dot plotStat 10, UCLA, Ivo Dinov18Data1.5 IQRMed1.5 IQRScaleQ1Q3(pull back until hit observation) (pull back until hit observation)Figure 2.4.4 Construction of a box plot.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.Construction of a box plotStat 10, UCLA, Ivo Dinov19Stem-and-leaf of strength N = 33Leaf Unit = 10 1 19 8 5 20 0334 5 20 10 21 00233 (8) 21 55668899 15 22 000111112 6 22 5 5 23 014 2 23 2 24 2 24 2 25 2 1 25 92000 2100 2200 2300 2400 2500 2600strength2000 2100 2200 2300 2400 2500 2600strengthFigure 2.4.5Three graphs of the breaking-strength data for}gear-teeth in positions 4 & 10 (Minitab output).Comparing 3 plots of the same dataStat 10, UCLA, Ivo Dinov20TABLE 2.5.1 Word Lengths for the First 100 Words on a Randomly Chosen Page3224443993623234653423452958324524142525369632344422423742642592371123644766104357775103239455443525242Value u 1 2 3 4 5 6789 10 11Frequency f 1 22 18 22 13 8616 2 1Frequency TableFrequency TableStat 10, UCLA, Ivo Dinov21)nsobservatio all of Sum(1 )occurrence offrequency (value of Sum 1nnx =×=Mean from a frequency tableStat 10, UCLA, Ivo Dinov22TABLE 2.5.2 Frequency Table for the Occurrence of Fish Species in Ocean StrataNo. of strata Frequency Percentage in


View Full Document

UCLA STATS 10 - Ch04

Download Ch04
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Ch04 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Ch04 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?