DOC PREVIEW
UT Dallas CS 6313 - Chapter_8_2a(2)

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1overviewNotationmedianmedianShapes of distributionsComputing the medianComputing the medianComputing the medianComputing the medianComputing the medianComputing the medianQuantiles, percentiles, quartilesQuantiles, percentiles, quartilesPROBABILITY AND STATISTICS IN COMPUTER SCIENCE AND SOFTWARE ENGINEERING Chapter 8: Introduction to Statistics1OVERVIEWWe’ve seen the first of our descriptive statistics: The sample mean ().We saw that the sample mean was unbiasedThis means , i.e. the expectation of the sample mean is the expectation of the populationWe saw that the sample mean was consistentThe probability of the sampling error goes to zero as the size of the sample goes to infinity as We saw the sample mean has asymptotic normalityApproximately normal distribution for large samples•62NOTATION population mean Sample mean, estimator of population standard deviation Sample standard deviation, estimator of population variance Sample variance, estimator of •63MEDIANProblem with sample mean: It is very sensitive to outliers, i.e. sample data points that are extremeThese can have a major impact on the sample mean, especially if n is not very largeNote also that the sample mean is often not a value that the real population variable takes on …Can we come up with another statistic that does a better job of measuring the location of the “middle value” for a population?The answer is yes: We can use the median.Note that this concept can be applied to both the sample and the population, as we shall see4MEDIANMedian means “central value”The Sample Median is a number that is preceded by at most half of the observations in the sample and exceeded by at most half of the observations.It can be thought of as the “middle” observation, if we order all of the sample pointsWe’ll see how to compute this shortlyThe Population Median M is a number that is exceeded with probability no greater than .5 and preceded with a probability no greater than .5 by any value the random variable can take on. In other words, M is defined such that and •65SHAPES OF DISTRIBUTIONSIf we compare the population mean and the population median, we can learn something about the “shape” of the population’s distributionSuppose … this would mean that the mean is somehow “shifted” to the right by some extreme values, but the majority of the population is more to the left“Right-skewed”Likewise, if we say the population distribution is “left-skewed”If the population median and mean are approximately equal, we say the distribution is “symmetric”See figure on page 214 …•66COMPUTING THE MEDIANRecalling the definition of the CDF, we can see how to compute the median for a population:Combining these two, we see that M can be defined as the value such that Sometimes this is easy to solve, other times it is not …•67COMPUTING THE MEDIANFirst, we consider some continuous distributions …For the uniform distribution, the computation is easy:Recall for .Solving for M gets us This coincides with the population mean, so the uniform distribution is symmetric.•68COMPUTING THE MEDIANNext, let’s look at the exponential distribution. We know, where .If we solve , we get , which is less than the population mean (.The exponential distribution is right-skewed•69COMPUTING THE MEDIANFor discrete distributions, recall the cdf is a step functionThis means that the equation may have many roots or none …In the first case, any value that solves the equation can be called the median, but usually the middle of the interval is chosenIn the second case, we usually take M to be the value where the cdf jumps above 0.5See the graphs on page 216•610COMPUTING THE MEDIANIf we look at the examples on pages 215-216, we can see how this works …For the first example, we can use Table A2 to see that 2 will satisfy the definition of M …In this case we can actually use any value between 2 and 3 for the medianIf we pick 2.4, for example, it is equally likely that we will have less than 2.4 successes (i.e. 2 or fewer) and less than 2.4 failures (i.e. 3 or more successes). Any value between 2 and 3 would work; normally just pick the middle, 2.5In the second example, we see the distribution is asymmetricHere we calculate the median to be 2, since this is point where the cdf jumps above .5See the figures on page 21611COMPUTING THE MEDIANFor sampled data, computing the median is done differentlyThe sample is always finite in size, and we can assume that all observations are equally likelyIn n is odd, the median is just the -th smallest observationIn other words, if we order the observations, it is the middle oneIf n is even, then we find the middle two observations: - th and - thAny value in between can be the median, but usually taken to be mid-pointSee the example on page 217•612QUANTILES, PERCENTILES, QUARTILESWe can generalize the notion of the median by replacing the “0.5” in its definition with any other probabilityA p-quantile of a population is a number x such that and .A sample p-quantile is any number that exceeds at most 100p% of the sample and is exceeded by at most of the sampleA -percentile is the -quantileThe first, second, and third quartiles are the 25th, 50th, and 75th percentiles. They split a sample or a population into four equal partsThe median is thus seen to be the 0.5-quantile, the 50th percentile, and the 2nd quartile•613QUANTILES, PERCENTILES, QUARTILESThe notations for these statistic are given on page 218Example 8.14 shows how to compute quartiles for a sampleConsider example 8.15:In this example, we are trying to find the tenth percentile for a population that has a specific Gamma distribution (with ). Here use the fact that the Gamma distribution can be approximated by a normal distribution for large ; should be large enough (see page 94 – Central Limit)We find the 10th percentile for the standard normal distribution using table A4; to do this we find the probability closest to .10 and find We then use the estimates to translate the result to the normal distribution associated with our Gamma distribution – we see the x value is


View Full Document

UT Dallas CS 6313 - Chapter_8_2a(2)

Documents in this Course
ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

ch09-02

ch09-02

22 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

ch09-02

ch09-02

22 pages

ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

PS-10

PS-10

18 pages

PS-9

PS-9

14 pages

PS-7

PS-7

11 pages

PS-6

PS-6

12 pages

PS-5

PS-5

8 pages

PS-4

PS-4

8 pages

probs 2-3

probs 2-3

17 pages

ch09-02

ch09-02

22 pages

ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

PS-10

PS-10

18 pages

PS-4

PS-4

8 pages

probs 2-3

probs 2-3

17 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

SCAN0004

SCAN0004

12 pages

SCAN0001

SCAN0001

12 pages

Prob9

Prob9

12 pages

prob10

prob10

3 pages

Load more
Download Chapter_8_2a(2)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter_8_2a(2) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter_8_2a(2) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?