DOC PREVIEW
UT Dallas CS 6313 - Chapter_8_3(2)

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1overviewhistogramhistogramStem and leafboxplotScatter and time plotsPROBABILITY AND STATISTICS IN COMPUTER SCIENCE AND SOFTWARE ENGINEERING Chapter 8: Introduction to Statistics1OVERVIEWLooking at graphs of distributions is often a great way to get an initial idea about the distributions parametersMay also be helpful to identity outliersThis may also be used to look for correlation between variablesThere are several types of data plots that are helpfulWe will explore these in this lecture; note most statistical packages provide the ability to create one or more of these plots2HISTOGRAMHistograms show the approximate shape of the pdf (or pmf for the underlying population)Standard construction: Create a series of “bins” to collect the data points, and then draw columns to represent the number of data points in each binThe overall shape (the heights of the bins represent the number of data points in each bin) gives a hint as to the distributionA relative frequency histogram is a histogram where the height of the column represents a proportion of the data in the bin to all of the data in the collected sample3HISTOGRAMIn general, if the sample is coming from a continuous distribution (e.g. time), we can think of the columns in the relative frequency histogram as approximating the area under the pdf curveLook at the example on page 225 and the curves on page 226 – we can draw assumptions about the underlying distributions by looking the shape of the histogramWe may also get an indication that the underlying distribution is actually a mixture of distributionsSee the “two hump” distribution on page 226Be careful when selecting bin “widths” – not too wide, not too narrow4STEM AND LEAFThese plots are similar to histograms but carry more informationWe can now get a better view of how the information is distributed within the columns …This works well with integer valued variables, but other variables can be scaled (translated) to work tooLook at the example on page 228 – this is basically the histogram turned sideways, with the actual numbers added in See also example 8.19 on page 2295BOXPLOTBoxplots show more information about the collected statistics, including minimum, maximum, median, and quartilesA typical 5-point summary for a sample of data includes the min, max, median, and first and third quartilesMean may also be included with a special symbol, like a crossObservations more than 1.5 interquartile range away from the median are usually shown as separate dots, indicating outliersAn example is show on page 230 … note the sample mean is includedOccasionally, we may break the data out into groups, for example group by daysSee the example on page 231 – this is also called “Candlestick” plot (stock prices)6SCATTER AND TIME PLOTSScatter plots are used to plot multiple variables (usually two)These plots can show a relationship between variablesCorrelationWe will also see (in Chapter 11) how statistical methods can be used to draw inference about missing data from these plots by finding trend linesOne specific example: When one of the variables is known (time), we can get a plot of data over time and look for trend linesSee the example on page


View Full Document

UT Dallas CS 6313 - Chapter_8_3(2)

Documents in this Course
ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

ch09-02

ch09-02

22 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

ch09-02

ch09-02

22 pages

ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

PS-10

PS-10

18 pages

PS-9

PS-9

14 pages

PS-7

PS-7

11 pages

PS-6

PS-6

12 pages

PS-5

PS-5

8 pages

PS-4

PS-4

8 pages

probs 2-3

probs 2-3

17 pages

ch09-02

ch09-02

22 pages

ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

PS-10

PS-10

18 pages

PS-4

PS-4

8 pages

probs 2-3

probs 2-3

17 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

SCAN0004

SCAN0004

12 pages

SCAN0001

SCAN0001

12 pages

Prob9

Prob9

12 pages

prob10

prob10

3 pages

Load more
Download Chapter_8_3(2)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter_8_3(2) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter_8_3(2) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?