Unformatted text preview:

1 Introduction to Statistics Ramon C. Littell Professor and Associate Chair, Department of Statistics [email protected] is Statistics? The purpose of statistics: To make inference about unknown quantities from samples of data For example: You want to know something about the age distribution of graduate students at the University of Florida; that is, how many ages are <22, <23, <24, etc. Or, you might want to know the average age. In either case you want information about the set of ages of all UF graduate students. These ages would be the population of interest. (Note: the ages are the population, not the people.) It is infeasible to get the ages of all UF graduate students. You cannot observe the entire population. Instead, you get ages of a subset of the population. The subset is called a sample. Then, you use the data in the sample to estimate what you want to know about the population.3Getting a Sample of Data from a Population There are several ways to get a sample of data from a population. In the case of the population of ages of UF graduate students, here are some examples: 1. Draw 100 names of graduate students at random from the UF Student Directory. Contact them and ask their ages. 2. Get the ages of the students in STA 6166 during a particular semester. 3. Go to a bar during finals week and ask the ages of all the patrons. Each of these approaches has its own drawbacks. Probably the first approach is best and the third is worst. The second approach might be acceptable, to the extent that students who take STA 6166 represent all UF graduate students.4Types of research studies and sources of data 1. Designed experiments: Treatments are applied to experimental units according to a prescribed plan 2. Surveys: Data are collected on existing units selected form a population according to a prescribed plan 3. Observational Studies: Data are gathered on units that are available Questions: What would you call the first approach of getting a sample? What would you call the second approach of getting a sample?5Other Examples of Populations and Samples Populations: 1. Amounts of grapefruit on all trees in Florida 2. Serum zinc levels in dogs in Gainesville area 3. Strengths of concrete from given mix of sand, cement and gravel Samples: 1. Amounts of grapefruit on trees in plots drawn from the state of Florida 2. Serum zinc levels in dogs entering UF College of Veterinary Medicine Small Animal Clinic 3. Measurements from samples of concrete with known ingredients in concrete mix6Data Summarization It is usually difficult to learn much about a set of measurements from a list. If you wanted to report information about the ages of UF graduate students, you would probably employ some method of data summarization. Here are some possibly ways to summarize the data: 1. Report the mean or the range of the data 2. Report how many values are in various age categories 3. Construct a graph to display the data7Example of Data Summarization Sixty-three pregnant women participated in a nutritional intake study. As a baseline indicator, their bodyweights (in kg) were recorded at the end of the first trimester. Here are the data: 42.3 51.8 61.4 70.2 80.5 104.5 44.8 52.7 61.8 70.5 81.8 112.0 47.3 53.6 62.3 70.5 84.8 131.8 48.9 53.9 62.3 70.7 84.8 49.5 55.0 63.0 71.4 86.4 55.5 63.2 72.0 86.4 55.9 63.4 72.7 88.2 56.4 64.1 73.9 89.8 57.0 64.3 74.5 57.0 64.8 74.8 57.0 66.6 75.0 57.5 66.8 75.5 57.5 67.3 75.7 59.1 68.2 75.9 59.3 68.2 75.9 68.9 69.8 5 15 17 15 8 3 Summary Statistics: Min: 42.3 Mean: 131.8 Range: 89.5 Standard deviation: 15.628Frequency Histogram of Bodyweight Data Body Weights of Pregnant Women in First Trimester0510152045 55 65 75 85 95 105 115 125 135Body Weights in kg (category midpoints)Frequency9Relative Frequency Histogram of Bodyweights Body Weights of Pregnant Women in First Trimester00.050.10.150.20.250.345 55 65 75 85 95 105 115 125 135Body Weights in kg (category midpoints)Relative Frequency10Guideline for Histogram Construction Divide range of data into 5 to 20 intervals. Counts number of data values in each interval. Draw bars whose heights reflect counts.11Another Example of Data Description Egg weights on particular date from 54 hens 53.4, 55.2,…, 80.8, 83.1 range = 83.1 – 53.4 = 29.7 intervals 50-55 55-60 60-65 65-70 70-75 75-80 80-85 freq 1 3 22 22 4 0 2 rel freq .0185 .0556 .4074 .4074 .0741 .0 .0370 Histogram of Egg Weight Data051015202550 55 60 65 70 75 80Egg WeightsFrequency12Sample descriptive statistics • Data (yi) 12 535467.0, 71.2, , 83.1, 69.7yy y y== = =… • Sample size n=54 • Sum 5412 535413531.3iiyy y y y=+++ + = =∑ • Mean / 3531.3/ 54 65.39iyyn== =∑ • Ordered data () ( ) ( ) ( )1 2 53 5453.4, 55.2, , 80.8, 83.1yy y y== = =… • Median () ()()()27 28/ 2 65.2 65.3 / 2 65.25yy+=+= • 75th Percentile (.75)54 = 40.5 ()4168.1y= • 25th Percentile (.25)54 = 13.5 ()1462.1y= Interpretation: No more than 25% below and no more than 75% above 62.1 No more than 75% below and no more than 25% above 68.113 Measures of Central Tendency • Mean 65.39 = y • Median 65.25 = 50th percentile • Mode 62.7 = most frequently occurring observation Measures of Dispersion • Range ymax – ymin = 83.1 – 53.4 • Inter-quartile range q3 – q1 = 68.1 – 62.1 = 6.0 • Variance ()()222226.74711iiiyyyynsnn−−== =−−∑∑∑ • Standard deviation 226.747 5.172ss== =14Empirical Rule The Empirical Rule provides a practical use of the standard deviation • If the distribution is “mound-shaped” then: Approx. 68% of the data are between ys−and ys+ Approx. 95% of the data are between 2ys−and2ys+ Approx. 99% of the data are between 3ys−and 3ys+15Empirical Rule for Egg Weight Data Egg Weights 1 53.4 1 55.2 2 58.3 59.2 7 60.2 60.3 61.0 61.4 61.5 61.5 61.8 12 62.0 62.0 62.1 62.2 62.6 62.7 62.7 62.7 63.0 63.0 63.5 63.6 8 64.3 64.5 64.7 65.2 65.3 65.4 65.4 65.9 9 66.0 66.0 66.0 66.3 67.0 67.0 67.4 67.5 67.6 8 68.1 68.2 68.8 69.0 69.1 69.2 69.7 69.8 2 71.2 71.8 2 72.0 73.1 1 80.8 1 83.1 y = 65.39 s = 5.17 Lower Upper Count % 60.22ys−= 70.56ys+= 43 79 2 55.02ys−= 2 75.73ys+= 51 95 3 49.88ys−= 3 80.90ys+= 53 9816Populations and Samples Parameters and Statistics There are means, standard deviations, etc. for


View Full Document

UF STA 6126 - Introduction to Statistics

Download Introduction to Statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?