PSU STAT 401 - Estimation - D2747443

Home> Schools> Penn State University> Statistics (STAT) > STAT 401> Estimation

PSU STAT 401 - Estimation

Pages 36

Download Save

Unformatted text preview:

Ch.7: Estimation1 IntroductionThe objective of data collection is to learn about the distribution, or aspects of thedistribution, of some characteristic of the units of a population of interest. In Chapter 6,we saw how to estimate certain key descriptive parameters of a population distribution,such as the mean, the variance, percentiles, and probabilities (or proportions), from thecorresponding sample quantities. In this chapter we will learn another approach to theestimation of such population parameters. This approach is based on the assumptionthat the population distribution belongs in a certain family of distribution models, andon methods for fitting a particular family of distribution models to data. Different fittingmethods will be presented and discussed.The estimators obtained from this other approach will occasionally differ from the esti-mators we saw in Chapter 6. For example, under the assumption that the populationdistribution is normal, estimators of population percentiles and proportions depend onlyon the sample mean and sample variance, and thus differ from the sample percentiles andproportions; the assumption of a uniform distribution yields an estimator of the popu-lation mean value which is different from the sample mean; the assumption of Poissondistribution yields an estimator of the population variance which is different from thesample variance.Another learning objective of this chapter is to develop criteria for selecting the bestamong different estimators of the same quantity, or parameter. For example, should thealternative estimators, which were mentioned in the preceding paragraph, be preferredover those which were discussed in Chapter 6? The same criteria can also help us de-cide whether a stratified sample is preferable to simple random sample for estimatingthe population mean or a population proportion. Finally, in this chapter we will learnhow to report the uncertainty of estimators through their standard error and how thatleads to confidence intervals for estimators which have (or have approximately) a normaldistribution. The above estimation concepts will be developed here in the context of asingle sample, but will be applied in later chapters to samples from several populations.12 Overview, Notation and TerminologyMany families of distribution models, including all we have discussed, depend on a smallnumber of parameters; for example, a Poisson distribution model is identified by thesingle parameter λ, and normal models are identified by two parameters, µ and σ2. Suchfamilies of distribution models are called parametric. An approach to extrapolatingsample information to the population is to assume that the population distribution is amember of (or belongs in) a specific parametric family of distribution models, and thenfit the assumed family to the data, i.e. identify the member of the parametric family thatbest fits the data.There are several methods/criteria for fitting a parametric family of distribution modelsto data. They all amount to estimating the model parameters, and taking as the fittedmodel the one that corresponds to the estimated parameters.Example 2.1. Car manufacturers often advertise damage results from low impact crashexperiments. In an experiment crashing n = 20 randomly selected cars of a certaintype against a wall at 5 mph, let X denote the number of cars that sustain no visibledamage. Here it is reasonable to assume that the distribution of X, which is the populationdistribution in this case, is a member of the family of binomial probability models. Abinomial distribution is identified by the sample size used (here the sample size is n = 20),and the parameter p, which is the probability that a randomly selected car will sustain novisible damage when crashed at 5 mph. The best fitting model is the binomial distributionthat corresponds to the estimated value of p. For example, if X = 12 of the 20 cars inthe experiment sustain no visible damage, the estimate of p is ˆp = 12/20, and the bestfitting model is Bin(20, 0.6).Example 2.2. The response time, X, of a robot to a certain malfunction of a certain pro-duction process (e.g. car manufacturing) is often the variable of interest. Let X1, . . . , X36denote 36 response times that are to be measured. Here it is not clear what the popula-tion distribution (i.e. the distribution of each Xi) might be, but it might be assumed (atleast tentatively) that this distribution is a member of the normal family of distributions.The model parameters that identify a normal distribution are its mean and variance.Thus, the best fitting model is the normal distribution with mean and variance equal tothe sample mean and sample variance, respectively, obtained from the data (i.e. the 36measured response times). For example, if the sample mean of the 36 response times isX = 9.3, and the sample variance is S2= 4.9, the best fitting model is N(9.3 , 4.9).2Example 2.3. The lifetime of electric components is often the variable of interest inreliability studies. Let T1, . . . , T25denote the life times, in hours, of a random sample of25 components. Here it is not clear what the population distribution might be, but itmight be assumed (at least tentatively) that it is a member of the exponential family ofmodels, which was introduced in Example 3.7, page 14 of Chapter 3. Thus, each Tihaspdf fλ(t) = λ exp(−λt), for t ≥ 0, fλ(t) = 0, for t < 0, for some λ > 0. Here the singlemodel parameter λ identifies the exponential distribution. Since the model mean value(i.e. the mean value of a population having the exponential distribution) is λ−1, and sincethe population mean value can be estimated by the sample mean, X, of the 25 life times,the model parameter λ can be estimated byˆλ = 1/X. Thus, the best fitting exponentialmodel is the exponential distribution with model parameter equal toˆλ. For example, ifthe average of the 25 life times is 113.5 hours, the best fitting model is the exponentialdistribution with λ = 113.5−1.Example 2.4. Suppose, as in the previous example, that interest lies in the distributionof the life time of some type of electric component, and let T1, . . . , T25denote the lifetimes, in hours, of a random sample of 25 such components. If the assumption that thepopulation distribution belongs in the exponential family does not appear credible, itmight be assumed that it is a member of the gamma family of distribution models. Thisis a richer family of models and it

View Full Document


School:
Email:
New Password:
Confirm Password:

PSU STAT 401 - Estimation

Sign up for free to view:

Please select your school