DOC PREVIEW
MIT 18 443 - Problem Set #8

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 18.443 Statistics for Applications Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.18.443 Problem Set 8 1. Here are the estimated “genome sizes” (amount of DNA in a haploid cell, or half the amount in a diploid cell) for various non-primate mammals, given in T. R. Gregory (2005), “Animal Genome Size Database,” http://www.genomesize.com. The units are in picograms. In parts (a)-(h), find t he sample median of the numbers for the given species or genus. If you notice a ny numbers that would tend to make the sample mean different from the sample median, or make the sample standard deviation large, do comment on that, but you need not compute the sample mean or standard deviation until part (i). Note t hat within each species, the genome sizes are already arranged in order, but starting over with a new species. (a) Bos taurus (domestic cattle): 3.15, 3.43, 3.57, 3.60, 3.65, 3.70, 3.73, 3.79, 3.93. (b) Ovis aries (sheep): 2.34, 2.57, 3.30, 3.33, 3.50. (c) Canis familiaris (domestic dog): 2.80, 2.85, 2.88, 2.96, 3.09, 3.19, 3.26, 3.43, 3.54. (d) Genus Felis (domestic, then some wild cats): 2.86, 2.91, 3.10, 3.22, 3.45, 2.92, 2.92, 3.00. (e) Oryctolagus cuniculus (common rabbit): 2.50, 2.88, 3.09, 3.26, 3.42, 3.52, 3.57. (f) Equus caballus (horse): 2.95, 3.15, 3.15, 3.21, 3.38, 3.48. (g) Mus musculus (house mouse): 2.45, 2.92, 3.2 5, 3.26, 3.26, 3. 28, 3.31, 3 .35, 3.38, 3.45 , 3.52, 4.03. (h) Rattus norvegicus (Norway rat, brown rat): 2.98, 3.05, 3.14, 3.27, 3.36, 3.82, 3.90. (i) For the eight sample medians you’ve found so far, find their sample mean, sample median, and sample standard deviation. Does it seem that the differences in measurements are more due to vari ation wi thin species (measurement errors) or to actual differences between mammal species? (j) There are a great many species of bats and measurements for them, so let’ s just consider fruit bats of three genera, Artibeus, Carollia and Dermanura. Find the sample median of the following genome sizes: 2.56, 2.74, 2.70, 2.93, 2.67, 3.06, 2.85, 2 .71, 2.73. For ( fruit) bats, does it seem there is an actual difference with other mammal species? 2. Would the sample median, or mean, of observations be more useful as a summary for the following kinds of data (give short explanations for your answers): (a) daily precipitation (rain, snow, etc.), say for Boston on each day of 2008? (b) family income in a region, for purposes of a large bakery distributing loaves of bread through supermarkets; (c) family income in a region, but for purposes of a manufacturer of priva te aircraft? 3. Rice, §12.5, Problem 26, but only for dogs 3 through 8, and use only a nonparametric method. 4. Let F be a continuous, strictly increasing distribution function, so there is a unique median x0 for which F (x0) = 1/2. Suppose x0 = 0 . Then for any real m we can form another distribution function Fm such that Fm(x) ≡ F (x−m). Then Fm will have median m. 1It’s known (and will b e seen during the week) that if V1, ..., Vn are a sample i.i.d. U[0, 1], the jth order statistic V(j) will have a beta(j, n −j + 1) distribution. In particular for n = 2k + 1 odd, the sample median V(k+1) has a beta(k + 1, k + 1) distribution. The variables F−1(V(j)) will have the joint distribution of the order stat istics X(1) < X(2) < < X(n) from a sample X1, ..., Xn i.i.d. (F ). Thus F−1(V(k+1)) will have the ··· distribution of the sample median X(k+1). Let’s say that an estimator T = Tn of a parameter θ is asymptotically normal with asymptotic variance σ2/n if the distribution of√n(Tn−θ) converges as n → ∞ to N(0 , σ2). (a) If Xj are i.i.d. N(µ, σ2) then what is the asymptotic variance of the sample mean X as an estimator of µ? (This is easy and involves no approxima tion.) (b) As n → ∞ through odd values only, find the a symptotic varia nce of the sample median as an estimator of µ for normal distributions. (Use the above considerations and the delta-method.) How does it compare with that of the sample mean? 1(c) Consider the standard Cauchy density f(x) = π(1+x2) for all real x and for any real m and 0 < σ < ∞ the Cauchy (m, σ) density fm,σ(x) = σ−1f((x − m)/σ). A variable with this density has median m. If Xj are i.i.d. with a Cauchy density, then the sample mean X actually has the same distribution as Xj for each n, so one may say it has infinite asymptotic variance as an estimator of the true median m. Find the asymptotic variance of the sample median by the same method as in part (b). 5. Consider the dat a on heat of sublimation of iridium in Rice, §10.9, Problem 2 6, but don’t do any part of the problem stated in Rice. Instead consider each row of 9 observations as a separate sample. Here are the rows, rearranged in order: First row ordered: 136.6 145.2 151.5 159.1 1 59.8 160.1 160.8 162.7 173.9 Second row ordered: 159.2 1 59.3 159.5 159.6 160.2 160.3 160.4 160.6 161.1 Third row ordered: 159.5 159.5 159.5 159.6 159.7 160.0 160.0 160.1 160.2 (a) Find the sample mean, median and variance for each of the three rows of 9 observations each. (b) Each row was tested for whether the 9 observations in it are i .i.d. normal by the Shapiro-Wilk test and none was rejected at the 0. 05 level. But for the whole set of 27 observat ions, the hypothesis that they are i.i.d. normal was strongly rejected, with a p-value less than 4 10−7 . This suggests that either the means or the variances differ significantly between · samples. Which seem to differ more strongly? Do a test for that. Hint: the answer may not be the same a s i f one were asked i n advance to test for whether the means are different, or the variances are different; the choice affects the Bonferroni correction. (c) For each row, find the M AD


View Full Document
Download Problem Set #8
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem Set #8 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem Set #8 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?