DOC PREVIEW
ISU STAT 496 - Two Independent Samples

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Two Independent SamplesW. Robert StephensonDepartment of StatisticsIowa State UniversityOne of the most commonly used statistical techniques is the comparison of two indepen-dent samples of measurement data. More specifically, the comparison of the means of twoindependent samples. This is often the basis for making a decision to go with a particularmethod, process or supplier. The validity of the procedure hinges on the random selectionof items for each sample. The correctness of the subsequent decision rests with the stabilityof the processes that produce items that can be selected for the samples.Example: In the handout “Display and Summary of Data,” data on the temperatures ofelectric irons set at 450oF are given. These data are reproduced below along with similardata from irons with thermostats from a second supplier.Supplier 1 Supplier 2445.0 438.0 441.8 450.7 444.0 454.9 450.0 459.1453.0 435.1 459.7 448.4454.7 430.3 458.8 465.4451.1 448.7 464.6 448.3451.7 443.3 456.9 454.9434.7 453.8 454.5 455.0436.7 451.1 449.9 459.3463.1 454.8 444.6 444.8469.1 455.0 449.4 438.6452.9 460.3 458.4466.3 438.8 436.1One can use dot plots or box plots to make a visual comparison of the data from the twosuppliers. Side-by-side box plots are given at the end of this handout. From that graph,the central values for the two samples look quite similar. The thermostats from Supplier 2show slightly more variation than those from Supplier 1. Both data distributions appear tobe fairly symmetric.To formalize the comparison, one can summarize the data in terms of sample means andsample standard deviations. This is done in the below (summary statistics are rounded).Supplier 1 Supplier 2n1=23 n2=23Y1= 450.5oF Y2= 451.1oFs1=8.28 s2=10.29These summaries verify that the sample of thermostats from Supplier 2 is slightly morevariable (s2=10.29 >s1=8.28). Also, the sample of thermostats from Supplier 2 has aslightly higher (more off target) mean (Y2= 451.1 > Y1= 450.5). The statistical questionbecomes: Is the difference in the two sample means an indication of a true difference insuppliers or can such a difference be explained by random sampling error?1The comparison of interest is Y1−Y2. This difference, −0.6oF in this example, is comparedto the standard error of the difference of two sample means. This standard error is given by:se(Y1− Y2)=sp1n1+1n2 wheresp=(n1− 1)s21+(n2− 1)s22n1+ n2− 2For our example,sp=22(8.28)2+ 22(10.29)244=√87.22 = 9.34andse(Y1− Y2)=2.754The difference in sample means can be evaluated in two ways.1. Confidence Interval(Y1− Y2) ± tdf ,α2∗ se(Y1− Y2)where for all except very small samples, tdf ,α2.= 2 for 95% confidence.The 95% confidence interval for the difference between the means for Supplier 1 andSupplier 2 is: −0.6 ± 2(2.754) or (–6.1, 4.9). Since this interval contains zero, nodifference between the mean temperatures for the two suppliers is inferred. If theentire interval is on one side of zero, then the difference between the two suppliers’means is said to be statistically significant.2. Test of HypothesisIn a formal statistical test of hypothesis, the difference in sample means is standardizedby dividing by the standard error to produce a test statistic. This test statistic isthen compared to a value from the tabulation of the t-distribution in order to assessstatistical significance. The form of the test statistic is:t =(Y1− Y2)se(Y1− Y2)If the absolute value of this test statistic is greater than the tabulated value froma t-distribution, the difference in sample means is said to be statistically significant.Otherwise, the difference is attributed to random sampling error. The following ruleof thumb can be used when a t-distribution is unavailable.• If |t| < 2, there is no statistically significant difference between the means of thetwo samples.• If |t| > 3, there is a statistically significant difference, that is the difference is solarge it cannot be explained by random variation alone.2• If 2 ≤|t|≤3, statistical significance depends on the number of observations andthe chance of making an error.Computer programs, like JMP, convert the t-test statistic into a probability value, P-value. This is a measure of how likely it is to get a difference in sample means largerthan the one observed when random sampling from identical frames. The smaller theP-value, the less likely random sampling can explain the difference. Thus small P-valueslead one to declaring the difference in sample means to be statistically significant.Below is the output of JMP: Basic Stats → Onew ay with Temp as the Y, Responseand Supplier as the X, Grouping. Choose the Means/Anova/t Test from the redtriangle pull down.tTestAssuming equal variancesDifference t Test DF Prob > |t|Estimate −0.565 −0.205 44 0.8383Std Err 2.754Lower 95% −6.12Upper 95% 4.99Note that there are slight differences in the calculated values since less rounding is donein JMP. Also the high P-value (P=0.84) indicates that it is very likely that the framesfrom which the samples were taken are identical (same mean, standard deviation andshape). Implicit in the formal statistical analysis presented above is an assumptionthat data are normally distributed. If this is not true, then the true P-value and thetrue confidence level will be different from what is reported.A note on enumerative and analytic purposesThe comparison of two independent samples can be enumerative in that the difference, orlack of difference, seen in the samples can be inferred to the frames from which the samplesare randomly selected. The standard error of the difference in sample means quantifiesthe uncertainty introduced by using random samples instead of complete coverage. Mostcomparisons do not stop at a simple description of the samples or even with inference to theframes. Instead, based on a comparison like the one above, decisions are often made to keepor change suppliers. This has an analytic purpose since future production will be affected.Information on the stability of the processes producing the frames from which the randomsamples are taken is essential. The standard error does NOT quantify the uncertaintyintroduced by unstable


View Full Document

ISU STAT 496 - Two Independent Samples

Download Two Independent Samples
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Two Independent Samples and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Two Independent Samples 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?