DOC PREVIEW
UF STA 6166 - SUMMARIZING DATA – SPREAD OR VARIABILITY

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Topic (5) SUMMARIZING DATA –SPREAD OR VARIABILITYDefn: the SAMPLE STANDARD DEVIATION is defined by the equatiTopic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-1 Topic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY How do we capture variability in a single summary statistic?X 0 1 2 3 4 5 6 7 Y 0 1 2 3 4 5 6 7 Z 0 1 2 3 4 5 6 7 Note how each of these datasets vary in their minimum and maximum values and how they vary within their distribution as well. a) Range of a Variable Defn: Range = Maximum value – Minimum Value e.g. fish lengths: range = 26 cm (51 - 25) fish weights: range = 1322 gms (1763 - 441) Question: is the range a robust measure for variability??Topic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-2 b) Standard Deviation of a set of data The distance xxi− is called the deviation of the ith value from the sample mean. EXAMPLE: fish lengths (08.40=x) • • • • • • • • • •• • ____|______|______|_____|______|_____|____ 25 30 35 40 45 50 deviationxxi=− 25 - 40.1= -15.1 25.5 - 40.1= -14.6 26 - 40.1= -14.1 28.5 - 40.1 = -11.6 44 - 40.1= 3.9 44 - 40.1= 3.9 45 - 40.1= 4.9 46 - 40.1= 5.9 48 - 40.1= 7.9 49 - 40.1= 8.9 49 - 40.1= 8.9 51 - 40.1= 10.9Topic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-3 Question: Might these deviations be useful information to describe the variability in a set of data? The standard deviation is a measure of the average deviation of values in a set of data. FACT: for any set of data, the deviations always sum to 0! So to be useful, we do the following: 1) calculate the deviations, xxi−, i=1,…,n 2) square each deviation, 2)( xxi−, i=1,…,n 3) sum up the squares, ∑=−niixx12)( 4) divide by (n-1) {NOT n} 2121)(snxxnii=−−∑= 5) take the square root snxxnii=−−∑=1)(12Topic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-4 s , the sample standard deviation, can be thought of as the typical or average deviation of an observation from the sample mean. EXAMPLE: fish lengths • • • • • • • • • •• • ____|______|______|_____|______|_____|____ 25 30 35 40 45 50 Deviations (Deviations)2 -15.1 228.01 -14.6 213.16 -14.1 198.81 -11.6 134.56 3.9 15.21 3.9 15.21 4.9 24.01 5.9 34.81 7.9 62.41 8.9 79.21 8.9 79.21 10.9 118.81 ------ ---------- Σ= 0.0 1203.42Topic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-5 Divide by (n-1): 2240.109)112(42.1203scm ==− Take the square root:scmcm == 46.10402.1092 Interpretation? Defn: the SAMPLE STANDARD DEVIATION is defined by the equation snxxnii=−−∑=1)(12. The SAMPLE VARIANCE is s2. The POPULATION VARIANCE is denoted σ2. The POPULATION STANDARD DEVIATION is denoted by σ . Question: How is it used ? 1. is the sample estimate of the population standard s deviation σ. (note that σ is almost always unknown!)Topic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-6 2. large values of s (or σ) imply large variability in a data set (but it depends on the scale as well) a) good for comparing two or more datasets when the data have the same units of measurement EXAMPLE Based on a sample of 50 acres on randomly selected farms in Maryland, the 1998 corn yield averaged 125 bushels per acre with a standard deviation (s.d.) of 40 bushels. The next year, a drought year, had an average yield of 83=x bushels per acre and 25=s. Let’s assume that the frequency distributions of the number of bushels per acre for fields in each of these 2 years look unimodal and symmetric , i.e. “normal”). Important Point: the range and the s.d. of a set of data that are approximately normally distributed are related as follows: srange 6minmax≈−= . So knowing x and s and that the data are “normal” in shape, we can graph and compare the two years yields: |____________________________________________| 0 250Topic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-7 3. Coefficient of Variation %100×==xsCV. Note that CV is unitless and is often used to compare different variables measured on different scales. EXAMPLE: Tennessee River fish study Fish lengths: %09.26%10008.4046.10=×=CV Fish weights: %76.40%10033.100076.407=×=CV DDT concentration: %87.96%10021.798.6=×=CV Question: which random variable (Length, Weight, DDTconc.) is the most variable? Question: Suppose I had measured the fish lengths in inches. Would the CV be the same?Topic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-8 c) The Interquartile Range Defn: The LOWER QUARTILE (Q1) of a dataset is the 25th percentile of the observations. Q1=median of the lower ½ of the sample The UPPER QUARTILE (Q3) is the 75th percentile of the observations. Q3=median of the upper ½ of the sample (if n is odd the median of the dataset is excluded from the calculations). The INTERQUARTILE RANGE (IQR) is the range of the middle 50% of the dataset. IQR = Q3 – Q1 . EXAMPLE n=12 Fish weights 441, 532, 544, 778, 897, 917, 986, 1023, 1266, 1398, 1459, 1763 Median: 5.9512986917=+=mTopic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-9 Q1: 6612778544=+ Q3: =+2139812661332 IQR: 1332-661=671 IQR is resistant to outliers and along with the other information is used to describe the variability in the dataset. d) Boxplots (5-number summary) 5-number summary: minimum, Q1, median, Q3, maximum Graphically (fish weights): a rectangle is drawn with a length of IQR where the left edge is located at Q1 and the right at Q3. “whiskers” are then drawn from the sides of the box to the minimum value in the data on the left and the maximum value of the data on the right.Topic (5) SUMMARIZING DATA – SPREAD OR VARIABILITY 5-10 NOTE: If the minimum and/or maximum values are unusual small, the whisker in the direction of the unusual value is modified to be no longer than 1.5 times the IQR. 500100015002000Quantilesmaximum quartilemedianquartile minimum100.0%99.5%97.5%90.0%75.0%50.0%25.0%10.0%2.5%0.5%0.0% 1763.0 1763.0 1763.0 1671.8 1365.0 951.5 602.5 468.3 441.0 441.0 441.0 The line in the middle of the box is the location of


View Full Document

UF STA 6166 - SUMMARIZING DATA – SPREAD OR VARIABILITY

Documents in this Course
Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

VARIABLES

VARIABLES

23 pages

Exam #2

Exam #2

4 pages

Exam2

Exam2

6 pages

Sampling

Sampling

21 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

5 pages

Load more
Download SUMMARIZING DATA – SPREAD OR VARIABILITY
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view SUMMARIZING DATA – SPREAD OR VARIABILITY and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SUMMARIZING DATA – SPREAD OR VARIABILITY 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?