DOC PREVIEW
UW-Madison STAT 411 - Stratified

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stratified Random Sampling (Chapter 11)This handout introduces the basic ideas and theory behind stratified random sampling es-timators, the stratification principle, allocation in stratified random sampling, a numberof examples illustrating the method, compares simple and stratified random sampling, andintroduces the ideas behind post-stratification.• Suppose we divide the population into L strata, where the variation within strata issmall relative to the variation between strata, in terms of some underlying responsevariable. We discussed and saw in an earlier handout that this situation minimizes thevariability in the stratified random sampling estimator.• Examples: Landscap es - stratified by habitat characteristics,People - stratified by characteristics (such as sex, income, etc.).Notation:Nh= the population size in stratum h, h = 1, 2, . . . , L,N =LXh=1Nh= the total p opulation size,nh= the sample size in stratum h, h = 1, 2, . . . , L,n =LXh=1nh= the total sample size,yhi= the ithobservation in the hthstratum,τh=NhXi=1yhi= the total of the observations in stratum h,τ =LXh=1τh= the overall total,µh= τh/Nh= the mean resp onse in stratum h,µ = τ/N = the overall mean response.Estimating τ and τh: Within each stratum, we estimate τhbybτh= Nhyh. Thenbτ =LXh=1bτh.• Ifbτhis an unbiased estimator of τh, h = 1, . . . , L, thenbτ =LXh=1bτhis unbiased for τ.Note that we could have a different sampling plan (other than an SRS) in each stratum.• Also, if the stratum samples are independently selected, then:Var(bτ) = VarÃLXh=1bτh!=LXh=1Var(bτh) (due to the independence of thebτh’s).• IfdVar(bτh) is unbiased for Var(bτh), thendVar(bτ) =LXh=1dVar(bτh) is unbiased for Var(bτ).Estimating µ and µh:bµ =bτ/N is an unbiased estimator of µ ifbτ is unbiased for τand Var(bµ) =1N2Var(bτ), so thatdVar(bµ) =1N2dVar(bτ).64• An alternative form for the estimatorbµ is given by:bµ =1Nbτ =1NLXh=1bτh=1NLXh=1Nhbµh=LXh=1µNhN¶| {z }weightsbµh,a weighted average of the stratum means (weighted by the proportional stratum size).This indicates that we only need to know the relative stratum sizes, not the actual sizesto estimate the population mean.• The variance ofbµ may then be expressed as:Var(bµ) = VarÃLXh=1µNhN¶bµh!=LXh=1µNhN¶2Var(bµh)underindependence.• The results derived above are true for any sampling plans within each stratum, not justsimple random sampling. These general results fall under the heading of “stratifiedsampling.”Note: “Stratified random sampling” means independent simple random samples (SRS’s)taken within each stratum. Under this setting, the stratified estimator of the populationmean and total can be derived as follows.Within stratum h:bτh= Nhyh(bµh= yh), where yhis the sample mean in stratum h.bτst=LXh=1bτh=LXh=1Nhyh(the estimated total from stratified random sampling)Var(bτst) =LXh=1Var(bτh) =LXh=1N2hµNh− nhNh¶σ2hnh=LXh=1Nh(Nh− nh)σ2hnh⇒dVar(bτst) =LXh=1Nh(Nh− nh)s2hnh.bµst= yst=Var(bµst) =1N2Var(bτst) =LXh=1µNhN¶2µNh− nhNh¶σ2hnh, wheredVar(bµst) replaces σ2hwith s2h.Example: Suppose we want to estimate the average number of hours of TV watched in theprevious week for all adults in some county. Suppose also that the p opulace of this countycan be grouped naturally into 3 strata (town A, town B, rural) as summarized in the tableat the top of the next page.Why might we stratify the population in this way?65Statistic Town A Town B Ruralh 1 2 3Nh155 62 93nh20 8 12 (SRS’s)yh33.90 25.12 19.00sh5.95 15.24 9.36bτh5254.5 1557.4 1767.0 (Nhyh)bτst=bτ1+bτ2+bτ3= 8578.9,yst=bτstN=8578.9310= 27.7Other way:yst=LXh=1µNhN¶yh=155310(33.90) +62310(25.12) +93310(19) = 16.95 + 5.024 + 5.7 = 27.7.dVar(yst) =3Xh=1µNhN¶2µNh− nhNh¶s2hnh=µ155310¶2µ155 − 20155¶5.95220+µ62310¶2µ62 − 862¶15.2428+µ93310¶2µ93 − 1293¶9.36212= 0.385 + 1.011 + 0.572 = 1.97 ⇒ SE(yst) = 1.40.A 95% confidence interval for µ is given by:yst± t∗(SE(yst)) = 27.7 ± (2.079)(1.40) = 27.7 ± 2.91 = (24.79, 30.61).• How many degrees of freedom are associated with this t-based critical value? How dowe determine these degrees of freedom?• We generally do not assume that all the σh’s are equal, so a Satterthwaite approximationshould be used to get the degrees of freedom associated with t∗. Here, using equation(4) on page 121 of the text, the approximate degrees of freedom are:d.f. =ÃLXh=1ahs2h!2LXh=1(ahs2h)2/(nh− 1)= 21.1, where ah=Nh(Nh− nh)nh.• An ultra-conservative choice for the degrees of freedom is to set:d.f. = min(n1− 1, n2− 1, . . . , nL− 1) = 7.• If all of the stratum sample sizes nh≥ 30, then a z-based critical value can be used.66Stratification Principle: Recall that any strata which make the units homogeneous withinand heterogeneous between are considered a “good” choice of strata.• Stratification can often be very effective with just a few strata; more strata lead todiminishing returns with greater effort. Too many strata will usually require moreeffort to sample and lead to less heterogeneity between strata.• Stratified random sampling is really nothing more than using a categorical auxiliaryvariable in the design phase of a study. In the TV example, we assume that where aperson lives is associated with the number of hours of TV watched. Here, the auxiliaryvariable is the stratum (where a person lives). Ratio and regression estimation areexamples of using a continuous auxiliary variable in the estimation phase of a study,after we have collected the data. Using a categorical variable in the estimation (ratherthan the design) phase of a study can be done with post-stratification, discussed laterin these notes. Note that a continuous variable can be used as an auxiliary variablein the design phase by dividing the range of values into categories. Note also that acontinuous auxiliary variable could be used as a categorical variable in the design phaseof a study by stratification and as a continuous variable in the estimation phase withratio or regression estimation. The stratification would be to ensure that the sampleincludes values across the range of the auxiliary variable x which will aid us in deter-mining the appropriate relationship between x and y in ratio or regression estimation.Allocation in Stratified Random SamplingIn planning a study requiring stratification of the population, an important consideration ishow to allocate a total


View Full Document

UW-Madison STAT 411 - Stratified

Download Stratified
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Stratified and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Stratified 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?