DOC PREVIEW
UW-Madison STAT 371 - Exploratory Data Analysis

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

DataExampleVariablesSampleCategorical VariablesGraphical SummariesDotplotsHistogramsStemplotsSkewnessNumerical SummariesMeasures of CenterQuantilesBoxplotsMeasures of DispersionEmpirical RuleSamples and PopulationsExploratory Data AnalysisBret LargetDepartments of Botany and of StatisticsUniversity of Wisconsin—MadisonStatistics 37113th September 2005Student DataData is often represented with a matrix.Sex Level Brothers Sisters MilesHomeFemale Fifth 0 0 102.7Female Fourth 2 0 51.3Female Fourth 1 0 1023.5Male Second 0 2 130.1Male Fourth 0 0 280.4Male Second 0 1 152.4Female Fourth 0 1 162.7Female Third 0 1 123.4Female Second 1 2 5.2Male Fourth 1 0 210.8UnitsIA unit is an object that can be measured, such as a person.IAll data on a single unit appears in a row.Sex Level Brothers Sisters MilesHomeFemale Fifth 0 0 102.7Female Fourth 2 0 51.3Female Fourth 1 0 1023.5Male Second 0 2 130.1Male Fourth 0 0 280.4Male Second 0 1 152.4Female Fourth 0 1 162.7Female Third 0 1 123.4Female Second 1 2 5.2Male Fourth 1 0 210.8VariablesIA variable is a characteristic of a unit that can be assigned anumber or a category.IThere is a column for each variable.Sex Level Brothers Sisters MilesHomeFemale Fifth 0 0 102.7Female Fourth 2 0 51.3Female Fourth 1 0 1023.5Male Second 0 2 130.1Male Fourth 0 0 280.4Male Second 0 1 152.4Female Fourth 0 1 162.7Female Third 0 1 123.4Female Second 1 2 5.2Male Fourth 1 0 210.8VariablesIVariables are either quantitative .Brothers Sisters MilesHome0 0 102.72 0 51.31 0 1023.50 2 130.10 0 280.40 1 152.40 1 162.70 1 123.41 2 5.21 0 210.8VariablesIVariables are either or categorical.Sex LevelFemale FifthFemale FourthFemale FourthMale SecondMale FourthMale SecondFemale FourthFemale ThirdFemale SecondMale FourthCategorical VariablesIIn a categorical variable, measurements are categories.IExamples include blood type, sex.IThe variable year in school is an example of an ordinalcategorical variable, because the levels are ordered.Quantitative VariablesIQuantitative variables record a number for each unit.IExamples include height, which is continuous and number ofsisters, which is discrete.IOften, continuous variables are rounded to a discrete set ofvalues (such as heights to the nearest inch or half inch).IWe can also make a categorical variable from a continuousvariable by dividing the range of the variable into classes (So,for example, height could be categorized asshort, average, ortall).IIdentifying the types of variables can be important becausesome methods of statistical analysis are appropriate only for aspecific type of variable.SamplesIA sample is a collection of units on which we have measuredone or more variables.IThe number of observations in a sample is called the samplesize.ICommon notation for the sample size is n.IWe typically use uppercase letters for variables and lower caselettersfor observed values.Summaries of Categorical VariablesIA frequency distribution is a list of the observed categoriesand a count of the number of observations in each.IA frequency distribution may be displayed with a table or withabar chart.IFor ordinal categorical random variables, it is conventional toorder the categories in the display (table or bar chart) in themeaningful order.IFor non-ordinal variables, two conventional choices arealphabetical and by size of the counts.IThe vertical axis of a bar chart may show frequency or relativefrequency.IIt is conventional to leave space between bars of a bar chart ofa categorical variable.Summary of Blood Type DataFrequency table:A AB B O NA’s22 6 9 21 28National averages: O (46%), A (40%), B (10%), AB (4%)Bar chart:A AB B O NA’s0 5 10 20Summary of Majors[,1]Anthropology 1Bacteriology 5Biochemistry 1Biological Aspects of Conservation 3Biology 18Biomedical Engineering 9Botany 1Dairy Science 5Genetics 19Italian 1Kinesiology 7Medical Microbiology and Immunology 4Nutritional Sciences 1Political Science 1Soil Science 1Undecided 2Wildlife Ecology 3Wildlife Ecology - Natural Resources 1Zoology 2sociology 1A Dotplot of Hours of SleepIQuantitative variables from very small samples can bedisplayed with adotplot.5 6 7 8 9HistogramsIHistograms are a more general tool for displaying thedistribution of quantitative variables.IA histogram is a bar graph of counts of observations in eachclass, but no space is drawn between classes.IIf classes are of different widths, the bars should be drawn sothatareas are proportional to frequencies.ISelection of classes is arbitrary. Different choices can lead todifferent pictures.IToo few classes is an over-summary of the data.IToo many classes can cloud important features of the datawith noise.Miles from MSCHistogram of MilesClassMilesClassFrequency0 500 1000 1500 20000 20 40 60 80Corrected Miles from MSCHistogram of MilesClass[MilesClass < 20]MilesClass[MilesClass < 20]Frequency0 2 4 6 8 100 10 20 30 40Miles from HomeHistogram of MilesHomeMilesHomeFrequency0 2000 4000 6000 80000 20 40 60 80Miles from Home for Students within 250 milesHistogram of MilesHome[MilesHome <= 250]MilesHome[MilesHome <= 250]Frequency0 50 100 150 200 2500 2 4 6 8 10 12Summary of HeightHeight (inches)Frequency55 60 65 70 750 5 10 15 20Height (inches)Frequency55 60 65 70 750 2 4 6 8 10 12Stem-and-Leaf DiagramsIStem-and-Leaf diagrams are useful for showing the shape ofthe distribution of small data sets without losing any (ormuch) information.IBegin by rounding all data to the same precision.IThe last digit is the leaf.IAnything before the last digit is the stem.IIn a stem-and-leaf diagram, each observation is represented bya single digit to the right of a line.IStems are shown only once.IShow stems to fill gaps!ICombining or splitting stems can lead to a better picture ofthe distribution.Stem-and-Leaf Diagram of Brothers and SistersBrothers:The decimal point is at the |0 | 0000000000000000000000001 | 0000000000000000000000000000000000000000000002 | 000000000003 | 000004 | 0Sisters:The decimal point is at the |0 | 0000000000000000000000000000000000000000001 | 000000000000000000000000000000002 | 00000000003 | 04 |5 |6 | 0Stem-and-Leaf of Miles from Class (< 5)Data:[1] 0.25 0.32 0.33 0.34 0.40 0.42 0.47 0.47 0.48 0.55 0.55 0.55 0.57 0.58 0.58[16] 0.59 0.60 0.62 0.62 0.62 0.63 0.64 0.64 0.65 0.65 0.65 0.67 0.68 0.70 0.71[31] 0.73 0.75 0.75 0.75 0.75 0.75 0.76 0.79 0.80 0.81 0.81 0.83 0.86 0.87 0.92[46] 0.98 0.99 1.00 1.02 1.03 1.03 1.03 1.06 1.08 1.09 1.10 1.11 1.11 1.12 1.13[61] 1.14 1.21 1.22 1.24 1.25 1.26 1.26 1.29 1.29 1.40 1.52 1.83 2.25


View Full Document

UW-Madison STAT 371 - Exploratory Data Analysis

Documents in this Course
HW 4

HW 4

4 pages

NOTES 7

NOTES 7

19 pages

Ch. 6

Ch. 6

24 pages

Ch. 4

Ch. 4

10 pages

Ch. 3

Ch. 3

20 pages

Ch. 2

Ch. 2

28 pages

Ch. 1

Ch. 1

24 pages

Ch. 20

Ch. 20

26 pages

Ch. 19

Ch. 19

18 pages

Ch. 18

Ch. 18

26 pages

Ch. 17

Ch. 17

44 pages

Ch. 16

Ch. 16

38 pages

Ch. 15

Ch. 15

34 pages

Ch. 14

Ch. 14

16 pages

Ch. 13

Ch. 13

16 pages

Ch. 12

Ch. 12

38 pages

Ch. 11

Ch. 11

28 pages

Ch. 10

Ch. 10

40 pages

Ch. 9

Ch. 9

20 pages

Ch. 8

Ch. 8

26 pages

Ch. 7

Ch. 7

26 pages

Load more
Download Exploratory Data Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exploratory Data Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exploratory Data Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?