DOC PREVIEW
UF STA 6166 - SUMMARIZING DATA – CENTER OR CENTRAL TENDENCY

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Topic (4) SUMMARIZING DATA –CENTER OR CENTRAL TENDENCYEXAMPLEThe dot plot of these data isTopic (4) SUMMARIZING DATA –CENTER OR CENTRAL TENDENCY 4-1 Topic (4) SUMMARIZING DATA – CENTER OR CENTRAL TENDENCY I) QUANTITATIVE DATA a) Median (50th percentile) Defn: The MEDIAN of a data set is the middle value. That is, when the data values are arranged from low to high, it is that value in the list such that half of the data points are smaller and the other half are larger. For even number of observations: The fish weights for the Tennessee River study (n=12) are: 986, 1023, 1266, 1398, 917, 1763, 1459, 778, 532, 441, 544, 897 1) first order them from low to high 2) median = average of the 2 middle values. 441, 532, 544, 778, 897, 917, 986, 1023, 1266, 1398, 1459, 1763 median = 5.9512986917=+=m 6 of the observed values fall below m and the other 6 are larger than mTopic (4) SUMMARIZING DATA –CENTER OR CENTRAL TENDENCY 4-2 For odd number of observations: Thirteen fish weights are: 986, 1023, 1266, 1398, 917, 1763, 1459, 778, 532, 441, 544, 897, 1129 1) first order them from low to high 2) median = the middle value. 441, 532, 544, 778, 897, 917, 985, 1023, 1129, 1266, 1398, 1459, 1763 m = 985 Important Point #1: The median is said to be robust because it is resistant to outliers Important Point #2: The sample median divides the total area under the bars in a histogram in half. Important Point #3: Populations also have medians called the population median (M). This number divides the area under the curve describing the population frequency distribution in halves.Topic (4) SUMMARIZING DATA –CENTER OR CENTRAL TENDENCY 4-3 b) Arithmetic Mean Defn: The MEAN of a data set is the average value. That is, it is the value obtained by adding all of the numbers together and dividing the result by the number of values in the sum (see symbols later). The SAMPLE MEAN is denoted as x (pronounced “x-bar”). The POPULATION MEAN is denoted µ (pronounced “mu”). EXAMPLE The fish lengths for the Tennessee River study are: 48, 45, 49, 51, 44, 49, 46, 28.5, 26, 25.5, 25, 44 The dot plot of these data is • • • • • • • • • • • • ____|______|______|_____|______|_____|____ 25 30 35 40 45 50 Length (cm) If each point has the same weight, where should the pivot point be to balance the x-axis (i.e. keep it horizontal)? Ans: the pivot point is the arithmetic meanTopic (4) SUMMARIZING DATA –CENTER OR CENTRAL TENDENCY 4-4 To calculate the sample mean for these data: Sum the data values and divide the result by n. 48+45+49+51+44+49+46+28.5+26+25.5+25+44 = 481 = 40.08 12 12 We say that the fish caught in the study averaged 40.08 cm in length. Important Point #1: If one were able to observe the value of every single element in a population (say, every single fish in the Tennessee River in 1978), then it would be possible to calculate the population mean µ. Since we can’t do that, we say that an estimate of the population mean µ is the sample mean x. Important Point #2: Is the mean robust? Ans: NO! It’s value depends directly on the values in the dataset. Important Point #3: The mean of a set of data is the fulcrum or balance point for the data.Topic (4) SUMMARIZING DATA –CENTER OR CENTRAL TENDENCY 4-5 NOTATION: X denotes the NAME of the variable e.g. LENGTH x denotes a value for the named variable e.g. 48 cm i a subscript which denotes the index number for the observation e.g. fish IDs run from 1 to 12 xi denotes the value for the ith observation (that is, the ith observed value) e.g. x1 = 48, x2 = 45, etc. Σ denotes the operation “SUM” So, we can write nxxxnxxnnii+++==∑=...211Topic (4) SUMMARIZING DATA –CENTER OR CENTRAL TENDENCY 4-6 For frequency distributions, the relationship of the mean to the median depends on the shape of the distribution: Skewed to the right: mean median Skewed to the left: mean median Symmetric and unimodal mean median Uniform mean median Bimodal mean median Question: So, which measure of center do you use when?Topic (4) SUMMARIZING DATA –CENTER OR CENTRAL TENDENCY 4-7 II) CATEGORICAL DATA – BINARY DATA In general, the summary statistics for categorical data are the relative frequencies of each category in the dataset. There is no such idea as a mean or average category, only the most common one or the least common one or some other appellation. For the special case of binary data (only two categories), the measure of central tendency is the “Proportion of Successes”. Defn: When there are only two possible outcomes, define one category to be the “Success” (it’s the category you are studying). The PROPORTION OF SUCCESSES, then, is the fraction of observations that are successes. When the dataset is a sample the SAMPLE PROPORTION is denoted p; when the dataset is the entire population the POPULATION PROPORTION is denoted π. So we can write, Nsuccessessizepopulationpopulationinsuccessesπnsuccessessizesamplesampleinsuccessesp####====Topic (4) SUMMARIZING DATA –CENTER OR CENTRAL TENDENCY 4-8 Example: Suppose a researcher is interested in the recovery of submerged aquatic vegetation (SAV) in the Chesapeake Bay. At each of 30 locations at which SAV was historically found, the scientist categorizes the spot as either medium to high amounts of SAV or low amount to no SAV. 11 locations had medium to high SAV levels. 2 categories: Success = “medium or high SAV level” Failure = “none or low SAV level” The proportion of sample locations with medium to high SAV levels is p = 11/30 = 0.3667. Example: Suppose the rate of the birth defect, spina bifida, is 1 baby in every 100,000 live births. Success = “has spina bifida” π = 0.00001 =


View Full Document

UF STA 6166 - SUMMARIZING DATA – CENTER OR CENTRAL TENDENCY

Documents in this Course
Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

VARIABLES

VARIABLES

23 pages

Exam #2

Exam #2

4 pages

Exam2

Exam2

6 pages

Sampling

Sampling

21 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

5 pages

Load more
Download SUMMARIZING DATA – CENTER OR CENTRAL TENDENCY
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view SUMMARIZING DATA – CENTER OR CENTRAL TENDENCY and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SUMMARIZING DATA – CENTER OR CENTRAL TENDENCY 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?