Ch 7 1 Estimation Introduction The objective of data collection is to learn about the distribution or aspects of the distribution of some characteristic of the units of a population of interest In Chapter 6 we saw how to estimate certain key descriptive parameters of a population distribution such as the mean the variance percentiles and probabilities or proportions from the corresponding sample quantities In this chapter we will learn another approach to the estimation of such population parameters This approach is based on the assumption that the population distribution belongs in a certain family of distribution models and on methods for fitting a particular family of distribution models to data Different fitting methods will be presented and discussed The estimators obtained from this other approach will occasionally differ from the estimators we saw in Chapter 6 For example under the assumption that the population distribution is normal estimators of population percentiles and proportions depend only on the sample mean and sample variance and thus differ from the sample percentiles and proportions the assumption of a uniform distribution yields an estimator of the population mean value which is different from the sample mean the assumption of Poisson distribution yields an estimator of the population variance which is different from the sample variance Another learning objective of this chapter is to develop criteria for selecting the best among different estimators of the same quantity or parameter For example should the alternative estimators which were mentioned in the preceding paragraph be preferred over those which were discussed in Chapter 6 The same criteria can also help us decide whether a stratified sample is preferable to simple random sample for estimating the population mean or a population proportion Finally in this chapter we will learn how to report the uncertainty of estimators through their standard error and how that leads to confidence intervals for estimators which have or have approximately a normal distribution The above estimation concepts will be developed here in the context of a single sample but will be applied in later chapters to samples from several populations 1 2 Overview Notation and Terminology Many families of distribution models including all we have discussed depend on a small number of parameters for example a Poisson distribution model is identified by the single parameter and normal models are identified by two parameters and 2 Such families of distribution models are called parametric An approach to extrapolating sample information to the population is to assume that the population distribution is a member of or belongs in a specific parametric family of distribution models and then fit the assumed family to the data i e identify the member of the parametric family that best fits the data There are several methods criteria for fitting a parametric family of distribution models to data They all amount to estimating the model parameters and taking as the fitted model the one that corresponds to the estimated parameters Example 2 1 Car manufacturers often advertise damage results from low impact crash experiments In an experiment crashing n 20 randomly selected cars of a certain type against a wall at 5 mph let X denote the number of cars that sustain no visible damage Here it is reasonable to assume that the distribution of X which is the population distribution in this case is a member of the family of binomial probability models A binomial distribution is identified by the sample size used here the sample size is n 20 and the parameter p which is the probability that a randomly selected car will sustain no visible damage when crashed at 5 mph The best fitting model is the binomial distribution that corresponds to the estimated value of p For example if X 12 of the 20 cars in the experiment sustain no visible damage the estimate of p is p 12 20 and the best fitting model is Bin 20 0 6 Example 2 2 The response time X of a robot to a certain malfunction of a certain production process e g car manufacturing is often the variable of interest Let X1 X36 denote 36 response times that are to be measured Here it is not clear what the population distribution i e the distribution of each Xi might be but it might be assumed at least tentatively that this distribution is a member of the normal family of distributions The model parameters that identify a normal distribution are its mean and variance Thus the best fitting model is the normal distribution with mean and variance equal to the sample mean and sample variance respectively obtained from the data i e the 36 measured response times For example if the sample mean of the 36 response times is X 9 3 and the sample variance is S 2 4 9 the best fitting model is N 9 3 4 9 2 Example 2 3 The lifetime of electric components is often the variable of interest in reliability studies Let T1 T25 denote the life times in hours of a random sample of 25 components Here it is not clear what the population distribution might be but it might be assumed at least tentatively that it is a member of the exponential family of models which was introduced in Example 3 7 page 14 of Chapter 3 Thus each Ti has pdf f t exp t for t 0 f t 0 for t 0 for some 0 Here the single model parameter identifies the exponential distribution Since the model mean value i e the mean value of a population having the exponential distribution is 1 and since the population mean value can be estimated by the sample mean X of the 25 life times the model parameter can be estimated by 1 X Thus the best fitting exponential model is the exponential distribution with model parameter equal to For example if the average of the 25 life times is 113 5 hours the best fitting model is the exponential distribution with 113 5 1 Example 2 4 Suppose as in the previous example that interest lies in the distribution of the life time of some type of electric component and let T1 T25 denote the life times in hours of a random sample of 25 such components If the assumption that the population distribution belongs in the exponential family does not appear credible it might be assumed that it is a member of the gamma family of distribution models This is a richer family of models and it includes the models of the exponential distribution The gamma distribution is identified by two parameters and and its pdf is of the form f x 1 x 1 e x x 0 where is the gamma
View Full Document
Unlocking...