Unformatted text preview:

GEOS 36501/EVOL 33001 13 January 2012 Page 1 of 23III. Sampling1 Overview of Sampling, Error, Bias1.1 Biased vs. random sampling1.2 Biased vs. unbiased statistic (or estimator)1.3 Precision vs. accuracy2 Error Estimates With Assumed SamplingDistribution2.1 Standard Error:Standard deviation of distribution of sample statistics that would result from infinitenumber of trials of drawing sample from underlying probability distribution and calculatingthe sample statistic.2.2 In practice we generally do not estimate error by repeatedsampling from the underlying distribution (expensive andtime-consuming), although there are exceptions.2.3 Approximations based on sample distribution (from Sokaland Rohlf):GEOS 36501/EVOL 33001 13 January 2012 Page 2 of 23GEOS 36501/EVOL 33001 13 January 2012 Page 3 of 232.4 Limitations:2.4.1 Many approximation formulae make assumptions about shape ofdistribution and sample size.2.4.2 We may be interested in novel statistic or one whose samplingdistribution is not well characterized.3 Bootstrap Error Estimates3.1 Estimate standard error by resampling from the singlesample we have.3.2 This approach uses sampling with replacement fromobserved sample to simulate sampling without replacementfrom the underlying distribution.3.3 Procedure3.3.1 Start with observed sample of size n and observed sample statistic, callit Z.3.3.2 Randomly pick a sample of size n, with replacement, from the observedsample.3.3.3 Calculate the sample statistic of interest on this random sample; call isZboot.3.3.4 Repeat many times (generally hundreds to thousands, ideally untilestimate of SE stabilizes).3.3.5 Calculate standard deviation of the Zboot.This is an estimate of the standard error of the observed sample statistic Z:SD(Zboot) ≈ SE(Z).3.4 Simple (but not necessarily most useful) example: trimmedmean• Define p-% trimmed mean as mean of sample with p% lowest and p% highestobservations discarded. (Idea is to try to reduce effect of outliers.)• Suppose data consist of 10 (ordered) observations: 1,2,3,4,8,10,12,15,20,30. Let thetrimmed mean be denoted Z. Then Z = (3 + 4 + 8 + 10 + 12 + 15)/6 = 8.67.GEOS 36501/EVOL 33001 13 January 2012 Page 4 of 23• R code to estimate SE(Z)#define functiontrim.mean<-function(x,ntrim){ii<-order(x)xtmp<-x[ii]return(mean(xtmp[(ntrim+1):(n-ntrim)]))}data<-c(1,2,3,4,8,10,12,15,20,30) #specify datan<-length(data)ntrim<-2 #specify number to trim from each sideZobs<-trim.mean(data,ntrim) #get observed valuenrep<-10000 #specify number of bootstrap replicatesZboot<-rep(NA,nrep) #assign memoryfor (i in 1:nrep) #get bootstrap replicatesZboot[i]<-trim.mean(sample(data,n,replace=TRUE),ntrim)SE<-sd(Zboot) #calculate bootstrap std. errorhist(Zboot,breaks=50) #plot histogram of results#alternative code, without loopsDATA<-matrix(sample(data,nrep*n,replace=TRUE),n,nrep)#each column is a bootstrap replicateZboot<-apply(DATA,2,trim.mean,ntrim)SE<-sd(Zboot)• This yields Zobs= 8.67 and SE(Z) ≈ 3.1.Histogram of ZbootZbootFrequency5 10 15 20 250100200300400500600GEOS 36501/EVOL 33001 13 January 2012 Page 5 of 233.5 Useful R function: sample(x,n,replace=TRUE[or FALSE])returns a random sample of size n from the vector x with or without replacement.3.6 To sample from array X so that the variables (columns) staytogether:• nr<-dim(X)[1] #get number of rows• i<-sample(1:nr,n,replace=TRUE[or FALSE])#returns vector of integers sampled on [1,n]• XSAMP<-X[i,]4 Parametric bootstrap4.1 Take observed sample and estimate relevant parameter fromit.4.2 Resample from parametric distribution with parameterequal to sample estimate (rather than resampling fromobserved distributi on) .4.3 This approach can also be applied to more complicatedsituations:for example, simulating a process with parameters estimated from data.4.3.1 We’ll do lots of this later...GEOS 36501/EVOL 33001 13 January 2012 Page 6 of 235 Examples of Finite-sample Bias (sample-size bias)5.1 Sample variance5.1.1P(x − ¯x)2/n is biased.This is systematically too low, which makes sense since it is based on squared deviationsfrom sample mean.5.1.2P(x − ¯x)2/(n − 1) is unbiased.5.2 Number of taxa5.2.1 Rarefaction method (from Raup 1975)• Abundance of species i is Ni; N =PNi.• Consider a particular species, i.•N −Ninis the number of ways of drawing the non-i individuals in a sample of n.•Nnis the number of ways of drawing all individuals.• Therefore, the ratio of these two is the probability of not drawing any individuals ofspecies i.• Therefore 1 minus this ratio is the probability of drawing at least one individual ofspecies i.• So the expected number of species is just the sum of this probability, calculated foreach species in turn.5.2.2 Caveats• Rarefaction for interpolation rather than extrap olation• Collecting curves vs. rarefaction curves• Apparent “leveling off” of curves does not imply that nearly everything has beenfound (only that you’re unlikely to find it with modest effort).• Curves affected by factors other than sample size (sampling method, taxonomictreatment, size of geographic area etc.).• Crossing of rarefaction curves can make interpretation difficult.GEOS 36501/EVOL 33001 13 January 2012 Page 7 of 23GEOS 36501/EVOL 33001 13 January 2012 Page 8 of 235.2.3 Examples of application of taxonomic rarefaction (Raup 1975; Raup andSchopf 1978)This example suggests that the increase in observed family diversity in post-Paleozoicechinoids cannot be accounted for by an increase in the number of species sampled.GEOS 36501/EVOL 33001 13 January 2012 Page 9 of 23This example suggests that much of the variation in the number of observed echinoidorders is consistent with differences in number of sampled species. (But does this meanthat’s really all that is going on?!)GEOS 36501/EVOL 33001 13 January 2012 Page 10 of 235.2.4 Interpretation of taxonomic rarefaction curves not entirelystraightforward.Sampling standardization to be treated in more detail laterGEOS 36501/EVOL 33001 13 January 2012 Page 11 of 235.3 Range5.3.1 Example: Range of samples fro m normal distributionGEOS 36501/EVOL 33001 13 January 2012 Page 12 of 23GEOS 36501/EVOL 33001 13 January 2012 Page 13 of 23GEOS 36501/EVOL 33001 13 January 2012 Page 14 of 23GEOS 36501/EVOL 33001 13 January 2012 Page 15 of 235.3.2 Example: Test for nonrandomness of sampling with respect tomorphology(Foote 1997, Paleobiology 23:181)GEOS 36501/EVOL 33001 13 January 2012 Page 16 of 235.3.3 Correction in general


View Full Document

UChicago GEOS 36501 - Sampling

Download Sampling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sampling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sampling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?