Unformatted text preview:

New Nonparametric Tolerance Regions Jesse Frey Abstract We develop two new nonstandard methods for obtaining nonparametric tolerance regions from a simple random sample The first method consists of taking the union of a certain number of the intervals between the order statistics from the sample The second method which generalizes the first consists of taking the union of a certain number of the intervals between a prespecified subset of the order statistics from the sample Both methods allow the choice of intervals to be made arbitrarily and after seeing the data but minimal length may be used as a choice criterion We show how to find the exact coverage probability for regions obtained using either method and we explore some properties of regions obtained using the two methods We use an ecological data set and a simulation study to show that the small sample performance of regions obtained using the two methods compares favorably to that of other nonparametric tolerance regions in the literature KEY WORDS Beta distribution Dirichlet distribution Judgment Order statistics 1 Introduction Several important classes of statistical inference techniques yield answers in the form of sets computed from the observed data If one is interested in estimating a particular parameter of a Jesse Frey is Assistant Professor Department of Mathematical Sciences Villanova University Villanova PA 19085 E mail jesse frey villanova edu 1 distribution then a confidence region is the appropriate inference technique If one is interested in predicting a future observation from a certain distribution then a prediction region is the appropriate inference technique However if one is interested in obtaining a region that contains at least some specified proportion of the probability for a particular distribution then a tolerance region is the appropriate inference technique Suppose that X1 Xn is a simple random sample from an unknown continuous distribution F Let p 0 1 be the desired probability content for the tolerance region and let S X1 Xn be a random subset of the real line Define CF S X1 Xn to be the probability content under F of the set S X1 Xn and let F be the set of distribution functions under consideration If inf P CF S X1 Xn p F F then S X1 Xn is a 100 tolerance region for a proportion p of the probability for the underlying distribution Thus a tolerance region is a region that with specified high probability contains at least a specified proportion of the probability for the underlying distribution F If F includes only distributions from a particular parametric family then S X1 Xn is a parametric tolerance region while if F is the set of all continuous distribution functions then S X1 Xn is a nonparametric tolerance region Common examples of both parametric and nonparametric tolerance regions are discussed both by Hahn and Meeker 1991 and by Vardeman 1992 The earliest nonparametric tolerance regions were proposed by Wilks 1941 who suggested using intervals of the form X r X n 1 r where X 1 X n are the order statistics from the sample X1 Xn Extensions and generalizations of Wilks s idea were soon proposed and in a series of papers over the next two decades very general methods for obtaining nonparametric tolerance regions were developed Tukey 1947 Murphy 1948 Fraser 1951 Fraser and Wormleighton 1951 Kemperman 1956 Walsh 1962 Several of these papers built on the idea of statistically equivalent blocks developed by Tukey 1947 and many of them developed methods that apply not 2 just to univariate data but also to data in higher dimensions What these nonparametric tolerance regions lack however is any sort of formal optimality Chatterjee and Patra 1980 Indeed even when the sample size is very large these tolerance regions may be much larger than the minimum region needed to contain a proportion p of the probability for the underlying distribution To obtain nonparametric tolerance regions that outperform the standard regions we must allow the shape of the region to be determined by the data Certain nonparametric tolerance regions that provide this sort of flexibility were explored by Di Bucchianico Einmahl and Mushkudiani 2001 They considered the tolerance region consisting of the shortest interval that contains a certain number of data points and their paper also contains the more general idea of using the tolerance region consisting of the shortest union of m intervals that contains a certain number of data points However there are computational challenges associated with applying this idea for m 1 and the coverage probabilities assigned by Di Bucchianico Einmahl and Mushkudiani 2001 are valid only asymptotically In this paper we develop two new methods that also allow the shape of the nonparametric tolerance region to be determined by the data We describe these two methods in Section 2 and we also provide a method for obtaining the exact coverage probability for tolerance regions obtained using each method In addition we apply the methods to a data set consisting of diameters at chest height for 584 longleaf pines In Section 3 we explore some properties of the new nonparametric tolerance regions and in Section 4 we compare the performance of the new tolerance regions to that of standard nonparametric tolerance intervals via a simulation study We give our conclusions and mention some possible generalizations in Section 5 3 2 The New Methods In this section we describe the two new nonstandard methods for obtaining nonparametric tolerance regions from a simple random sample We prove that the exact coverage probabilities for regions obtained using either of the two methods are those obtained when the underlying distribution F is uniform and when the choice of intervals is made on the basis of length We then apply both the new methods and some existing methods from the literature to a data set consisting of diameters at chest height for 584 longleaf pines The first method that we consider consists of taking the union of k of the intervals between consecutive order statistics from the sample The choice of which k intervals to include can be made arbitrarily and after seeing the data but minimal length may be used as the choice criterion If choosing intervals on the basis of minimum length we would compute each of the n 1 differences X 2 X 1 X n X n 1 rank these differences from smallest to largest and then include in the tolerance region only those intervals X i X i 1 whose length ranks them among the


View Full Document

PSU STAT 401 - New Nonparametric Tolerance Regions

Loading Unlocking...
Login

Join to view New Nonparametric Tolerance Regions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view New Nonparametric Tolerance Regions and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?