**Unformatted text preview:**

Inference for One Mean PSTAT 5LS Slide Set 6 PSTAT 5LS Inference for One Mean Slide Set 6 1 65 Introduction Now let s turn our attention to quantitative data and inferential procedures for their mean average The sample mean x is a statistic so it has its own distribution The mathematical model for the sampling distribution of the sample mean has been thoroughly studied so we begin with that model Once we have a sense for this model we will turn to applying it to settings for three different parameters one mean the mean of the differences for two dependent groups d and the difference in the means of two independent groups 1 2 PSTAT 5LS Inference for One Mean Slide Set 6 2 65 Sampling Distribution for a Sample Mean We saw earlier that the sampling distribution for the sample proportion p is approximately normal provided certain conditions are met Now we have an analogous result The sampling distribution associated with a sample mean is nearly normal provided certain conditions are met PSTAT 5LS Inference for One Mean Slide Set 6 3 65 Central Limit Theorem for a Sample Mean When we collect a sufficiently large sample of n independent observations from a population with mean and standard deviation the sampling distribution of x will be approximately normal with mean and standard error n 1 1Remember that the standard error is what we call the standard deviation of a sample statistic PSTAT 5LS Inference for One Mean Slide Set 6 4 65 Central Limit Theorem for a Sample Mean Using a normal distribution when working with sample means If a quantitative variable has a normal distribution then the sampling distribution of x will be normal If a quantitative variable does not have a normal distribution then the Central Limit Theorem tells us that we can use a normal distribution to model the sampling distribution of x PSTAT 5LS Inference for One Mean Slide Set 6 5 65 A Couple of Things to Note Before we get to inference about means we need to address two things As we did for proportions we need to check conditions before we model x with a normal distribution The conditions are a little more complex than they were for modeling p We will discuss them shortly The standard error necessitates knowing the population standard deviation However it s unrealistic to think that we know the value of We almost always need to estimate with the sample standard deviation s This estimation adds a little more variability to our methods so we need to use a new distribution called the t distribution to take this additional variability into account PSTAT 5LS Inference for One Mean Slide Set 6 6 65 Two Required Conditions for Using the CLT for x Independence The sample observations must be independent of one another Normality When the sample is small we require that the sample observations come from a normally distributed population We can relax this condition more and more for larger and larger sample sizes PSTAT 5LS Inference for One Mean Slide Set 6 7 65 Conditions Independence The independence condition states that the observations must be independent of one another In other words the value of one observation does not impact the value of any other observation in the sample This condition is satisfied when we have a random sample from the population When we don t have a random sample we need to think about whether it is reasonable for there to be independence between the sample observations PSTAT 5LS Inference for One Mean Slide Set 6 8 65 Conditions Normality The normality condition is a bit vague so here are a couple of general rules Small samples When the sample size n is small and there are no clear outliers we typically assume that the data come from a nearly normal distribution Large samples When the sample size n is large and there are no extreme outliers the sampling distribution of x will be approximately normal even if the data come from a distribution that is not normally distributed Thanks Central Limit Theorem Saying small n and large n is still vague so here are a few guidelines2 slight skew is okay for sample sizes up to about 15 moderate skew is okay for sample sizes up to about 30 strong skew is okay for sample sizes of about 60 or more 2When checking whether the population is nearly normal we give the population distribution the benefit of the doubt However if the histogram is clearly not normal and or if there are extreme outliers then the nearly normal condition is not reasonable PSTAT 5LS Inference for One Mean Slide Set 6 9 65 Introducing the t Distribution As mentioned above it is extremely rare for us to know the value of the population standard deviation As such we rarely know the value of the standard error n We can estimate the population standard deviation with the sample standard deviation s and estimate the standard error of x with s n PSTAT 5LS Inference for One Mean Slide Set 6 10 65 Introducing the t Distribution When we estimate with s we have added extra variability to our calculations The standard normal N 0 1 distribution does not account for this additional variability Instead we need to use a new distribution called the t distribution which has thicker tails than the normal distribution to take this additional variability into account PSTAT 5LS Inference for One Mean Slide Set 6 11 65 normal distributiont distribution In Search of Better Beer Aside The Student s t distribution came from a desire to brew better beer Yes you read that right William Sealy Gosset joined the Guinness Brewery on 1 October 1899 as a junior brewer He was appointed Brewer in charge of the newly established experimental brewery in 1907 and later established the statistical department which he ran until 1936 3 In 1908 Gosset published his work The Probable Error of a Mean in Biometrika under the pseudonym Student One of the results of that paper is that Gosset s distribution has been shown that this curve represents the facts fairly well even when the distribution of the population is not strictly normal 4 3https www guinness storehouse com content pdf archive factsheets general history wsgosset and students t test pdf 4Student March 1908 The Probable Error of the Mean Biometrika 6 1 pages 1 25 https doi org proxy lib umich edu 10 2307 2331554 PSTAT 5LS Inference for One Mean Slide Set 6 12 65 Characteristics of the t Distribution The t distribution is a symmetric bell shaped distribution that is always centered at zero and has one parameter the degrees of freedom which describe the precise form of

View Full Document