Nordheim Statistics 411 Spring 2015 Example for Two-Stage Clustering Consider the company “Slurpy Cola” (SC). They have 500 plants in North America. Each plant has from 11 to 35 “bottling lines”. Lines sometimes need to be shut down for maintenance. The SC company wishes to estimate the mean number of hours that a “line” is down during a given month. (To do so requires looking at the operation log for a given line. Each plant maintains the logs for their lines.) A random sample of 10 plants is selected. For each plant, the sampling plan was to randomly select 0.25 (or the closest proportion that is not less than 0.25) of the lines. For each line the number of hours that the line is down during the given month is recorded. Plant number Total # lines in plant # lines sampled # of “down” hours for each sampled line: 1 14 4 5.2 1.4 7.3 2.5 2 26 7 1.3 2.9 6.4 1.7 3.8 4.0 2.1 3 18 5 0.8 6.3 2.4 9.2 4.6 4 17 5 4.5 3.0 1.6 2.9 3.2 5 31 8 1.1 2.6 5.3 3.7 1.4 4.0 2.6 2.2 6 23 6 3.4 2.9 13.3 6.7 1.9 5.5 7 27 7 2.4 0.9 1.7 3.6 0.8 3.0 2.7 8 18 5 4.1 1.3 2.8 6.5 1.9 9 15 4 3.6 1.8 3.4 4.1 10 24 6 2.9 0.7 9.3 5.2 1.7 3.6 213 57 > N=500 > n=10 > hrsplt1=c(5.2,1.4,7.3,2.5) > hrsplt2=c(1.3,2.9,6.4,1.7,3.8,4.0,2.1) …………. > hrsplt10=c(.7,9.3,5.2,1.7,3.6) > > ybar1=mean(hrsplt1) > ybar2=mean(hrsplt2) ………. > ybar10=mean(hrsplt10) > ybar1 [1] 4.1 > ybar2 [1] 3.171429 > ybar10 [1] 4.1 > v1=var(hrsplt1) > v2=var(hrsplt2) ………. > v10=var(hrsplt10) > v1 [1] 7.1 > v2 [1] 3.065714 > v10 [1] 11.455 > M1=14 > M2=26 ………. > M10=24 > m1=4 > m2=7 ……….. > m10=6 > Mi=c(M1,M2,M3,M4,M5,M6,M7,M8,M9,M10) > ybari=c(ybar1,ybar2,ybar3,ybar4,ybar5,ybar6,ybar7,ybar8,ybar9,ybar10) > hrsi=Mi*ybari > > xlim=range(0,Mi) > ylim=range(0,hrsi)> plot(Mi,hrsi,plot.window(xlim,ylim)) The linear relationship is not very strong but there is no strong evidence that the line does not pass through the origin. (Thus, the ratio estimate approach should be of some modest value in variance reduction.) > estnum=M1*ybar1+M2*ybar2+M3*ybar3+M4*ybar4+M5*ybar5+M6*ybar6 +M7*ybar7+M8*ybar8+M9*ybar9+M10*ybar10 > estden=M1+M2+M3+M4+M5+M6+M7+M8+M9+M10 > est=estnum/estden > > est [1] 3.583922 Note that this is very close to what an “eyeball” slope through the origin would be in the above plot. ----------- Now let us look at the first term in the variance: > t1t1=(M1*(ybar1-est))**2 > t1t2=(M2*(ybar2-est))**2 ……….. > t1t10=(M10*(ybar10-est))**2 > t1t1 [1] 52.20198 > t1t2 [1] 115.0219 > t1t10 [1] 153.4099 > ssq1=(t1t1+t1t2+t1t3+t1t4+t1t5+t1t6+t1t7+t1t8+t1t9+t1t10)/(n-1) > Mbar=(M1+M2+M3+M4+M5+M6+M7+M8+M9+M10)/n > vt1=ssq1*(1-(n/N))/(n*(Mbar**2)) > > vt1 [1] 0.1308714 Now consider the second term in the variance: > t2t1=(M1**2)*(1-(m1/M1))*v1/m1 > t2t2=(M2**2)*(1-(m2/M2))*v2/m2 ………. > t2t10=(M10**2)*(1-(m10/M10))*v10/m10 > t2t1 [1] 248.5 > t2t2 [1] 216.3518 > t2t10 [1] 824.76 > vt2=(t21+t22+t23+t24+t25+t26+t27+t28+t29+t210)/((Mbar**2)*N*n) > vt2 [1] 0.001459588 > estvarest=vt1+vt2 > seest=sqrt(estvarest) > seest [1] 0.3637733 0 5 10 15 20 25 300 20 40 60 80 100
View Full Document