Sampling DistributionsBret LargetDepartment of StatisticsUniversity of Wisconsin - MadisonOctober 4, 2004Statistics 371, Fall 2003Sampling Distributions• A population may be modeled as a box with numbered orcolored balls.• We think of asample of data as having been selected atrandom from this population.• From each sample, we can calculate asample statistic suchas a sample mean.• Thesampling distribution of the sample mean is the collec-tion of all possible sample means that could occur by randomsampling (of a given sample size n).• The textbook refers to the thought exercise of consideringall the ways a sample could have turned out as ameta-experiment.Statistics 371, Fall 2004 1Dichotomous ObservationsConsider a cross of two heterozygotes, Aa ×Aa.The probability distribution of the genotypes of the offspring isas follows.Offspring GenotypecrossAAAaaaAa × Aa0.250.500.25If Y is the number of dominant offspring (AA or Aa) in a sampleof size n = 2, and if ˆp = Y /n is the sample proportion, then thesampling dist ribution of ˆp is tabulated below. Probabilities arefrom the binomial distribution.Y ˆp Prob.0 0.0 0.06251 0.5 0.37502 1.0 0.5625Statistics 371, Fall 2004 2Larger ExampleFor the previous cross, what is the probability that exactly 15 of20 offspring are dominant?The number of dominant offspring will have a binomial distribu-tion with n = 20 and p = 0.75.> dbinom(15, 20, 0.75)[1] 0.2023312What is the probability that ˆp is within 0.05 of p?Translate the probability to a binomial question.Pr {0.70 ≤ ˆp ≤ 0.80} = Pr {0.70 ≤ Y/20 ≤ 0.80}= Pr {14 ≤ Y ≤ 16}> sum(dbinom(14:16, 20, 0.75))[1] 0.5606259Statistics 371, Fall 20043A fancy R trickHere is R code to do the previous calculation for a variety ofsample sizes.> N = 10 * c(2, 4, 8, 16, 32, 64)> for (n in N) {+ print(sum(dbinom(seq(0.7 * n, 0.8 * n, by = 1), n, 0.75)))+ }[1] 0.5606259[1] 0.6389116[1] 0.7553899[1] 0.8799318[1] 0.9670862[1] 0.9970046Statistics 371, Fall 20044Quantitative Observations• Now consider a population where each individual is associatedwith a quantitative variable.• We can compute the sample mean from each sample.• Thesampling distribution of¯Y is the collection of samplemeans from the meta-experiment of all possible samples ofsize n.Statistics 371, Fall 2004 5Sampling Distribution of¯Y• The mean of the sampling distribution of¯Y , µ¯Y, is the sameas the population mean. In symbols,µ¯Y= µ.• The standard deviation of the sampling distribution of¯Y , σ¯Y,is smaller than the population standard deviation by a factorof√n. In symbols,σ¯Y=σ√n.• If the sample size n is sufficiently large, the shape of thesampling distribution of¯Y will be approximately normal. Thisis theCentral Limit Theorem.• If the population is normal, a sample size of 1 suffices.• If the population is not normal, it dep ends on how thepopulation differs from normality to determine if the normalapproximation is reasonably accurate.Statistics 371, Fall 2004 6Example calculationSuppose that the weights of seeds are approximately normal witha mean of 500 mg and a standard deviation of 150 mg. Findthe probability that the sample mean is between 450 and 50 fora variety of s ample sizes.For n = 4, we havePr450 ≤¯Y ≤ 550= Pr(450 − 500150/√4≤¯Y − 500150/√4≤550 − 500150/√4)= Pr {−0.67 ≤ Z ≤ 0.67}= 0.5028from a normal table calculation.Statistics 371, Fall 2004 7Fancy R exampleHere is sample R code to do this calculation for several ndifferently than the previous example.> N = c(4, 8, 16, 32, 64)> len = length(N)> p = rep(0, len)> for (i in 1:len) {+ p[i] = pnorm(550, 500, 150/sqrt(N[i])) - pnorm(450, 500,+ 150/sqrt(N[i]))+ }> cbind(N, p)N p[1,] 4 0.4950149[2,] 8 0.6542214[3,] 16 0.8175776[4,] 32 0.9406536[5,] 64 0.9923392The first number in this table disagrees with the previouscalculation slightly because R did not round off the z score tothe nearest hundredth.Statistics 371, Fall 2004 8Exercise 5.18Assume that height of corn plants are normally distributed witha mean 145 cm and a standard deviation of 22 cm.What proportion of plants are between 135 and 155 cm?> pnorm(155, 145, 22) - pnorm(135, 145, 22)[1] 0.3505637Find Pr135 ≤¯Y ≤ 155when n = 16.> pnorm(155, 145, 22/sqrt(16)) - pnorm(135, 145, 22/sqrt(16))[1] 0.9309637Find Pr135 ≤¯Y ≤ 155when n = 36.> pnorm(155, 145, 22/sqrt(36)) - pnorm(135, 145, 22/sqrt(36))[1] 0.993614Statistics 371, Fall
View Full Document