PSU STAT 401 - New Nonparametric Tolerance Regions - D1739069

Home> Schools> Penn State University> Statistics (STAT) > STAT 401> New Nonparametric Tolerance Regions

DOC PREVIEW

PSU STAT 401 - New Nonparametric Tolerance Regions

School name Penn State University

Course Stat 401- Experimental Methods

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

New Nonparametric Tolerance RegionsJesse Frey∗AbstractWe develop two new nonstandard methods for obtaining nonparametric tolerance regionsfrom a simple random sample. The first method consists of taking the union of a certainnumber of the intervals between the order statistics from the sample. The second method,which generalizes the first, consists of taking the union of a certain number of the intervalsbetween a prespecified subset of the order statistics from the sample. Both methods allow thechoice of intervals to be made arbitrarily and after seeing the data, but minimal length maybe used as a choice criterion. We show how to find the exact coverage probability for regionsobtained using either method, and we explore some properties of regions obtained using the twomethods. We use an ecological data set and a simulation study to show that the small-sampleperformance of regions obtained using the two methods compares favorably to that of othernonparametric tolerance regions in the literature.KEY WORDS: Beta distribution; Dirichlet distribution; Judgment; Order statistics1 IntroductionSeveral important classes of statistical inference techniques yield answers in the form of sets com-puted from the observed data. If one is interested in estimating a particular parameter of a∗Jesse Frey is Assistant Professor, Department of Mathematical Sciences, Villanova University, Villanova, PA19085. E-mail: [email protected], then a confidence region is the appropriate inference technique. If one is interested inpredicting a future observation from a certain distribution, then a prediction region is the appro-priate inference technique. However, if one is interested in obtaining a region that contains at leastsome specified proportion of the probability for a particular distribution, then a tolerance region isthe appropriate inference technique.Suppose that X1,...,Xnis a simple random sample from an unknown continuous distributionF . Let p ∈ (0, 1) be the desired probability content for the tolerance region, and let S(X1,...,Xn)be a random subset of the real line. Define CF(S(X1,...,Xn)) to be the probability content, underF , of the set S(X1,...,Xn), and let F be the set of distribution functions under consideration. IfinfF ∈FP (CF(S(X1,...,Xn)) ≥ p)=γ,then S(X1,...,Xn)isa100γ% tolerance region for a proportion p of the probability for the un-derlying distribution. Thus, a tolerance region is a region that, with specified high probability,contains at least a specified prop ortion of the probability for the underlying distribution F .IfFincludes only distributions from a particular parametric family, then S(X1,...,Xn) is a parametrictolerance region, while if F is the set of all continuous distribution functions, then S(X1,...,Xn)is a nonparametric tolerance region. Common examples of both parametric and nonparametrictolerance regions are discussed b oth by Hahn and Meeker (1991) and by Vardeman (1992).The earliest nonparametric tolerance regions were proposed by Wilks (1941), who suggestedusing intervals of the form (X(r),X(n+1−r)), where X(1)< ··· <X(n)are the order statistics fromthe sample X1,...,Xn. Extensions and generalizations of Wilks’s idea were soon proposed, andin a series of papers over the next two decades, very general methods for obtaining nonparametrictolerance regions were developed (Tukey 1947, Murphy 1948, Fraser 1951, Fraser and Wormleighton1951, Kemperman 1956, Walsh 1962). Several of these papers built on the idea of statisticallyequivalent blocks developed by Tukey (1947), and many of them developed methods that apply not2just to univariate data, but also to data in higher dimensions. What these nonparametric toleranceregions lack, however, is any sort of formal optimality (Chatterjee and Patra 1980). Indeed, evenwhen the sample size is very large, these tolerance regions may be much larger than the minimumregion needed to contain a proportion p of the probability for the underlying distribution.To obtain nonparametric tolerance regions that outperform the standard regions, we must allowthe shape of the region to be determined by the data. Certain nonparametric tolerance regionsthat provide this sort of flexibility were explored by Di Bucchianico, Einmahl, and Mushkudiani(2001). They considered the tolerance region consisting of the shortest interval that contains acertain number of data points, and their paper also contains the more general idea of using thetolerance region consisting of the shortest union of m intervals that contains a certain number ofdata points. However, there are computational challenges associated with applying this idea form>1, and the coverage probabilities assigned by Di Bucchianico, Einmahl, and Mushkudiani(2001) are valid only asymptotically.In this paper, we develop two new methods that also allow the shape of the nonparametrictolerance region to be determined by the data. We describe these two methods in Section 2, andwe also provide a method for obtaining the exact coverage probability for tolerance regions obtainedusing each method. In addition, we apply the methods to a data set consisting of diameters at chestheight for 584 longleaf pines. In Section 3, we explore some properties of the new nonparametrictolerance regions, and in Section 4, we compare the performance of the new tolerance regions tothat of standard nonparametric tolerance intervals via a simulation study. We give our conclusionsand mention some possible generalizations in Section 5.32 The New MethodsIn this section, we describe the two new nonstandard methods for obtaining nonparametric toleranceregions from a simple random sample. We prove that the exact coverage probabilities for regionsobtained using either of the two methods are those obtained when the underlying distribution F isuniform and when the choice of intervals is made on the basis of length. We then apply both thenew methods and some existing methods from the literature to a data set consisting of diametersat chest height for 584 longleaf pines.The first method that we consider consists of taking the union of k of the intervals betweenconsecutive order statistics from the sample. The choice of which k intervals to include can be madearbitrarily and after seeing the data, but minimal length may be used as the choice criterion. Ifchoosing intervals on the basis of minimum length, we would compute each of the n −1 differencesX(2)−X(1),...,X(n)−X(n−1), rank

View Full Document