DOC PREVIEW
UT Dallas CS 6313 - Chapter_8_1-2

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1overviewoverviewPopulation and samplePopulation and sampleErrors in samplingsamplingSimple descriptive statisticsSample meanunbiasedconsistencyAsymptotic normalityPROBABILITY AND STATISTICS IN COMPUTER SCIENCE AND SOFTWARE ENGINEERING Chapter 8: Introduction to Statistics1OVERVIEWWe have reviewed the basics of probability … but in all of these examples, we were given the distribution and its parametersThis rarely happens in practice – we usually do not have this kind of control over the systemMore likely, we do not know the parameters (even if we suspect there is a particular underlying distribution)We can, however, make assumptions about the underlying distribution and parameters if we collect data and analyze the data2OVERVIEWDuring the remainder of the course, we will be doing the following:We will learn to visualize data patterns, and make quick assessments We’ll see how to estimate distribution parameters, and assess the reliability of our estimatesWe’ll learn how to test hypothesesWe’ll see how to test relations between variablesWe’ll see how to fit models and use the to make forecastsThis will of course require us to understand the underlying probability distributions and their characteristics, which is why we reviewed the first four chapters3POPULATION AND SAMPLEA population consists of all units of interest. Any numerical characteristic of the population is a parameter. A sample consists of observed units collected from the population. It is used to make statements about the population. Any function of a sample is called a statistic.The fundamental idea of statistics: Estimate the population’s parameters by collecting a sample and using the sample to approximate the parameter we are interested inWe could find the parameter directly if we took a census of the entire population, but this is often difficult or impossibleA population parameter is usually denoted as , and the estimator (obtained from the sample) by . See Figure 8.1 on page 209 for a diagram•?4POPULATION AND SAMPLENote the sample may give a misleading estimate:In example 8.1, we suppose that exactly 80% of the population is happy with a service. We take a sample of 10 people and ask if they are happy with the service. This is a random variable with a Binomial distribution; using the table in Appendix A2, we see there is 3.3% chance that a random sample of 10 people will have 5 “failures”, i.e. report that they are unhappy with the serviceIn other words, there is always a possibility – hopefully small – that the sample will give a very misleading answerThis is called sampling error, and it cannot be eliminated (uncertainty)5ERRORS IN SAMPLINGSampling errors cannot be eliminated – they are a fact of life in an uncertain worldBut we can also introduce non-sampling errors if we are not carefulExample: Using bad sampling techniques, or sampling the wrong populationYou would not survey dog owners about their preference in cat foodIt is also possible to take a sample in which the samples are dependent – see example 8.3Example 8.4 is more subtle – in general, when you sample a population, you are assuming that you are randomly selecting your sample. You must be careful that your sampling technique does not favor a particular segment of the population. Example 8.5 is another example of this error6SAMPLINGFor our purposes, we will assume simple random sampling. This means we will assume that our samples are made up of units collected from the population independently of each other, in a way that makes it equally likely that any unit in the population can be sampled.Example: To sample the UTD student population, we generate 100 random numbers in the range of active student IDs and use these numbers to identify the students in our sampleNote: This would not be a good way to sample the population of the DFW area!We are limiting ourselves to UTD in this example – does the student body represent the general population of the DFW area?7SIMPLE DESCRIPTIVE STATISTICSWe will first look at some simple descriptive statistics that can be deduced from a sampleAssume we have collected a sample . What can we measure?Mean: This is the average value of a sampleMedian: This measures the “central value” of the sampleQuantiles and quartiles: These show where certain portions of the sample lieVariance, standard deviation, and interquartile range: These will essentially measure how “spread out” the data isWe will see that these statistics are in fact random variables themselves (they are computed from random data), and so they have sampling distributionsThese statistics will help us draw conclusions about the underlying population distribution•?8SAMPLE MEANThe sample mean is denoted by and it estimates the mean of the overall population, . We compute this statistic a follows:We would expect the sample mean to be a good estimate of the population expectation when n is large …We now develop some terminology to help qualify that assumption•?9UNBIASEDWe say that an estimator (or statistic) is unbiased if . In other words, the expected value of the estimator is the parameter it is trying to estimate. The bias of the estimator is defined to be Unbiased means just what it says – our estimator does not create estimates that are biased to one side or the other of the actual valueThe estimator is equally likely to underestimate as it is to overestimate the true value of the parameterUsing the linear property of expectation, it is easy to see that the sample mean we defined on the last slide is unbiased•?10CONSISTENCYWe say that an estimator (or statistic) is consistent if the probability of its sampling error of any magnitude converges to 0 as the sample size increases to infinity. In other words, given any , as .This means that for large samples, the estimation error is unlikely to exceed , and it does so with smaller and smaller probability as n goes to infinityThe sample mean is consistent - this is shown on page 213•?11ASYMPTOTIC NORMALITYWe saw (by the Central Limit Theorem) that the sum of observations has approximately normal distribution.It immediately follows that the sample mean is approximately normal for large nWe can in fact normalize the sample mean to create the following statistic:And this statistic converges to the Standard Normal as . This is called Asymptotic


View Full Document

UT Dallas CS 6313 - Chapter_8_1-2

Documents in this Course
ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

ch09-02

ch09-02

22 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

ch09-02

ch09-02

22 pages

ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

PS-10

PS-10

18 pages

PS-9

PS-9

14 pages

PS-7

PS-7

11 pages

PS-6

PS-6

12 pages

PS-5

PS-5

8 pages

PS-4

PS-4

8 pages

probs 2-3

probs 2-3

17 pages

ch09-02

ch09-02

22 pages

ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

PS-10

PS-10

18 pages

PS-4

PS-4

8 pages

probs 2-3

probs 2-3

17 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

SCAN0004

SCAN0004

12 pages

SCAN0001

SCAN0001

12 pages

Prob9

Prob9

12 pages

prob10

prob10

3 pages

Load more
Download Chapter_8_1-2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter_8_1-2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter_8_1-2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?