Statistics for Analysis of Experimental Data Catherine A Peters Department of Civil and Environmental Engineering Princeton University Princeton NJ 08544 Published as a chapter in the Environmental Engineering Processes Laboratory Manual S E Powers Ed AEESP Champaign IL 2001 1 Statistics for Analysis of Experimental Data Catherine A Peters Department of Civil and Environmental Engineering Princeton University Princeton NJ 08544 Statistics is a mathematical tool for quantitative analysis of data and as such it serves as the means by which we extract useful information from data In this chapter we are concerned with data that are generated via experimental measurement Experimentation often generates multiple measurements of the same thing i e replicate measurements and these measurements are subject to error Statistical analysis can be used to summarize those observations by estimating the average which provides an estimate of the true mean Another important statistical calculation for summarizing the observations is the estimate of the variance which quantifies the uncertainty in the measured variable Sometimes we have made measurements of one quantity and we want to use those measurements to infer values of a derived quantity Statistical analysis can be used to propagate the measurement error through a mathematical model to estimate the error in the derived quantity Sometimes we have measured two different things and we want to know whether there really is a difference between the two measured values Analysis of variance t tests can be used to estimate the probability that the underlying phenomena are truly different Finally we may have measured one variable under a variety of conditions with regard to a second variable Regression analysis can be used to come up with a mathematical expression for the relationship between the two variables These are but a few of the many applications of statistics for analysis of experimental data This chapter presents a brief overview of these applications in the context of typical experimental measurements in the field of environmental engineering This chapter is necessarily brief in presentation Students who seek a deeper understanding of these principles should study a textbook on statistical analysis of experimental data The bibliography at the end of this chapter lists some useful textbooks some of which are directly aimed at environmental engineers and scientists Error Analysis and Error Propagation Errors in Measured Quantities and Sample Statistics A very important thing to keep in mind when learning how to design experiments and collect experimental data is that our ability to observe the real world is not perfect The observations we make are never exactly representative of the process we think we are observing Mathematically this is conceptualized as measured value true value error 1 The error is a combined measure of the inherent variation in the phenomenon we are observing and the numerous factors that interfere with the measurement Every effort should be made to reduce systematic errors through efforts such as calibration of measurement instruments It is impossible to totally eliminate all measurement error If the underlying error is truly random not biased then we can still gain useful information by making multiple observations i e replicates and calculating the average In order for the sample to be truly representative of the underlying phenomenon that is being measured it must be a 2 random sample For example let s say that you are running an experiment in which you have set up eight batch reactors and you plan to sacrifice one batch reactor every hour to measure the concentration of some chemical Every time you select a batch reactor you should randomly select from the remaining reactors You should not sample the reactors in the same order as you prepared them nor should you sample the reactors in the order in which they are positioned on your bench top You never know how these other factors may influence the controlling processes in the reactors By randomly sampling the reactors any systematic error due to other factors is randomly distributed across your measurements Randomness helps to ensure independence of the observations When we say that we want independent observations what we really mean is that we want the errors in the observations to be independent of each other Aside from nonrandom sampling there are other laboratory activities that could jeopardize independence of the observations For example if an inexperienced experimentalist gets better at making a certain type of measurement then the error may get smaller over time In this case the error is a function of the order in which the measurement is made and the errors are not independent Similarly if a measurement device wears out every time it is used then the error may increase over time This too would produce errors that are not independent Random sampling and other efforts to make the observation errors independent help to ensure representativeness If all the observations are truly representative of the same underlying phenomenon then they all have the same mean and variance i e the errors are identically distributed Sometimes the acronym IID is used to collectively refer to the criteria that a sample of observations is independent I and identically distributed ID Given a sample of n observations the sample average is calculated as n xi x i 1 n 2 where xi represents the ith individual observation The sample average is a statistic that is an estimate of the mean or central tendency of the underlying random variable The sample variance is n xi x 2 s 2 i 1 n 1 3 The sample variance is a statistic that is an estimate of the variance 2 in the underlying random variable Another useful statistic is the sample standard deviation s which is the square root of the sample variance The quantity n 1 is the number of degrees of freedom associated with the sample standard deviation It is often the case that we are more interested in the estimate of the mean than in the individual observations What we really want to know then is what is the variance in the average value That is how does the variance in x translate into uncertainty in our ability to estimate the mean The standard error of the mean is sx s n 4 which also has n 1 degrees of freedom Clearly when the number of observations n is large the uncertainty in the estimate of the mean is small This relationship demonstrates
or
We will never post anything without your permission.
Don't have an account? Sign up