FSU CIS 5930r - Lecture 4 Comparing Systems Using Sample Data - D422020

Home> Schools> Florida State University> Computer Science (CIS) > CIS 5930r> Lecture 4 Comparing Systems Using Sample Data

FSU CIS 5930r - Lecture 4 Comparing Systems Using Sample Data

School name Florida State University

Course Cis 5930r- Selected Topics in Computer Science (13).

Pages 34

Download Save

Unformatted text preview:

Comparing Systems Using Sample DataComparison MethodologyWhat is a Sample?Sample StatisticsEstimating Population from SamplesEstimating ErrorEstimating the Value of a Random VariableConfidence IntervalsConfidence Interval of Sample MeanEstimating Confidence IntervalsThe z DistributionExample of z DistributionGraph of z Distribution ExampleThe t DistributionExample of t DistributionGraph of t Distribution ExampleGetting More ConfidenceMaking DecisionsTesting for Zero MeanComparing AlternativesComparing Paired ObservationsExample: Comparing Paired ObservationsComparing Unpaired ObservationsThe t-test (1)The t-test (2)Comparing ProportionsSpecial ConsiderationsSelecting a Confidence LevelHypothesis TestingOne-Sided Confidence IntervalsSample SizesChoosing a Sample SizeExample of Choosing Sample SizeWhite SlideComparing SystemsUsing Sample DataAndy WangCIS 5930-03Computer SystemsPerformance Analysis2Comparison Methodology•Meaning of a sample•Confidence intervals•Making decisions and comparing alternatives•Special considerations in confidence intervals•Sample sizes3What is a Sample?•How tall is a human?–Could measure every person in the world–Or could measure everyone in this room•Population has parameters–Real and meaningful•Sample has statistics–Drawn from population–Inherently erroneous4Sample Statistics•How tall is a human?–People in Lov 103 have a mean height–People in Lov 301 have a different mean•Sample mean is itself a random variable–Has own distribution5Estimating Populationfrom Samples•How tall is a human?–Measure everybody in this room–Calculate sample mean –Assume population mean equals•What is the error in our estimate?xx6Estimating Error•Sample mean is a random variable Mean has some distributionMultiple sample means have “mean of means”•Knowing distribution of means, we can estimate error7Estimating the Valueof a Random Variable•How tall is Fred?•Suppose average human height is 170 cmFred is 170 cm tall–Yeah, right•Safer to assume a range8Confidence Intervals•How tall is Fred?–Suppose 90% of humans are between 155 and 190 cmFred is between 155 and 190 cm•We are 90% confident that Fred is between 155 and 190 cm9Confidence Intervalof Sample Mean•Knowing where 90% of sample means fall, we can state a 90% confidence interval•Key is Central Limit Theorem:–Sample means are normally distributed–Only if independent–Mean of sample means is population mean –Standard deviation of sample means (standard error) isn10EstimatingConfidence Intervals•Two formulas for confidence intervals–Over 30 samples from any distribution: z-distribution–Small sample from normally distributed population: t-distribution•Common error: using t-distribution for non-normal population–Central Limit Theorem often saves us11The z Distribution•Interval on either side of mean:•Significance level  is small for large confidence levels•Tables of z are tricky: be careful!nszx2112Example of z Distribution•35 samples: 10, 16, 47, 48, 74, 30, 81, 42, 57, 67, 7, 13, 56, 44, 54, 17, 60, 32, 45, 28, 33, 60, 36, 59, 73, 46, 10, 40, 35, 65, 34, 25, 18, 48, 63•Sample mean = 42.1. Standard deviation s = 20.1. n = 35.•90% confidence interval is)7.47,5.36(351.20)645.1(1.42 x13Graph of z Distribution Example02040608010090% C.I.14The t Distribution•Formula is almost the same:•Usable only for normally distributed populations!•But works with small samples nstxn 1;2115Example of t Distribution•10 height samples: 148, 166, 170, 191, 187, 114, 168, 180, 177, 204•Sample mean = 170.5. Standard deviation s = 25.1, n = 10.•90% confidence interval is•99% interval is (144.7, 196.3))0.185,0.156(101.25)833.1(5.170 x16Graph of t Distribution Example05010015020025090% C.I.99% C.I.17Getting More Confidence•Asking for a higher confidence level widens the confidence interval–Counterintuitive?•How tall is Fred?–90% sure he’s between 155 and 190 cm–We want to be 99% sure we’re right–So we need more room: 99% sure he’s between 145 and 200 cm18Making Decisions•Why do we use confidence intervals?–Summarizes error in sample mean–Gives way to decide if measurement is meaningful–Allows comparisons in face of error•But remember: at 90% confidence, 10% of sample C.I.s do not include population mean19Testing for Zero Mean•Is population mean significantly  0?•If confidence interval includes 0, answer is no•Can test for any value (mean of sums is sum of means)•Our height samples are consistent with average height of 170 cm–Also consistent with 160 and 180!20Comparing Alternatives•Often need to find better system–Choose fastest computer to buy–Prove our algorithm runs faster•Different methods for paired/unpaired observations–Paired if ith test on each system was same–Unpaired otherwise21ComparingPaired Observations•For each test calculate performance difference•Calculate confidence interval for differences•If interval includes zero, systems aren’t different–If not, sign indicates which is better22Example: ComparingPaired Observations•Do home baseball teams outscore visitors?•Sample from 9-4-96:–H 4 5 0 11 6 6 3 12 9 5 6 3 1 6–V 2 7 7 6 0 7 10 6 2 2 4 2 2 0–H-V 2 -2 -7 5 6 -1 -7 6 7 3 2 1 -1 6•Mean 1.4, 90% interval (-0.75, 3.6)–Can’t tell from this data–70% interval is (0.10, 2.76)ComparingUnpaired ObservationsCIs do not overlap  A > BCis overlap and mean of one is in the CI of the other  A ~= BCis overlap and mean of one is in the CI of the other  A ~= BCis overlap but mean of one is not in the CI of the other  t-test23MeanABMeanABMeanABMeanAB24The t-test (1)1. Compute sample means and2. Compute sample standard deviations sa and sb3. Compute mean difference =4. Compute standard deviation of difference:bbaansnss22axbxbaxx 25The t-test (2)5. Compute effective degrees of freedom:6. Compute the confidence interval:7. If interval includes zero, no difference 21111//2222222bbbaaabbaansnnsnnsns  stxxba ;2/126Comparing Proportions•If k of n trials give a certain result, then confidence interval is:•If interval includes 0.5, can’t say which outcome is statistically meaningful•Must have k  10 to get

View Full Document


School:
Email:
New Password:
Confirm Password:

FSU CIS 5930r - Lecture 4 Comparing Systems Using Sample Data

Sign up for free to view:

Please select your school