Unformatted text preview:

Sample data is usually too big to easily digest by simply looking at the actual data, so it isnecessary to be familiar with methods of making the main features of a dataset understandable.We will review both numerical ways to summarize a dataset and graphical ways to visualize it.Data comes in several types; for this review these two categories are useful:(1) Discrete (unordered categorical, ordered categorical, or numerical), and(2) Numerical (discrete or continuous).These categories are not perfect, as it is possible for data to be both discrete and numerical. Inthis case which type of method to use will depend on the data set; sometimes both types may behelpful.1 Discrete VariablesSubjects were students in grades 4-6 from three school districts in Ingham and Clinton Counties,Michigan. Chase and Dummer stratified their sample, selecting students from urban, subur-ban, and rural school districts with approximately 1/3 of their sample coming from each dis-trict. Students indicated whether goo d grades, athletic ability, or popularity was most im-portant to them. The questionnaire also asked for gender information. Data and story fromhttp://lib.stat.cmu.edu/DASL/Datafiles/PopularKids.html. The data looks something likethis...girl Urban Sportsgirl Suburban Gradesboy Rural Popular. . .1.1 A numerical summary: Contigency TablesA contigency table (or frequency table) simply records the count for each category. It can be doneeither for variables by themselves or together.For Goals and Gender, separately:GoalsGrades Popular Sports247 141 90Genderboy girl227 251For Goals and Gender, together (a two-way contigency table)GenderGoals boy girlGrades 117 130Popular 50 91Sports 60 3011.2 A graphical summary: Bar GraphsFor individual variables, a bar graph plots a bar for each category, with the height equal to thenumber in that category.Grades Popular Sports0 100 200boy girl0 100 200Would more students rather be popular or be good at sports?For viewing two variables together, there are two common ways; the first is a bar plot, witheach bar split into the desired categories; the second is a mosaic plot, where the area correspondsto the frequency.Grades Popular Sports0 40 80GenderGoalsboygirlGradesPopularSportsWould more boys rather be popular or good at sports?Would more girls rather be popular or good at sports?Do you think it’s likely that Gender and Goals are independent? Why or why not?22 Single Numerical Variables2.1 Graphical MethodsFor univariate numerical data, there are three graphical methods we’ll look at for showing whatthe data looks like, stem-leaf plots, histograms, and b ox plots.> ruth[1] 54 59 35 41 46 25 47 60 54 46 49 46 41 34 22> maris[1] 8 13 14 16 23 26 28 33 39 61ruthFrequency0 20 40 600 1 2 3 4 5 6 7marisFrequency0 20 40 600 1 2 3 4 5 6 710 30 50Ruth10 30 50MarisMore complex boxplots (with outliers)...32.2 Numerical Summary StatisticsThe most common methods of numerically summarizing a data set are the sample mean, samplestandard deviation, and the five number summary, which is the minimum, first quartile, median,third quartile, and maximum.The sample mean is a simple average, and is usually denoted with a bar over the variable:¯x =1nnXi=1xi.The sample standard deviation is computed similarly to the population standard deviation, exceptby using the actual data, though the divisor is n− 1, not n, as you might expect. The usual notationis s, wheres =vuut1n − 1nXi=1(xi− ¯x)2.Compare the sample mean and standard deviation with the boxplots for these made-up datasets:●●●●●●●A B C D0 1 2 3 4for A, mean=2.06, sd=1.07for B, mean=2, sd=0.41for C, mean=1.04, sd=0.99for D, mean=0.53, sd=0.463 Two or More Numerical VariablesDoes physical strength matter for physically demanding jobs? To answer this, 147 individualsworking as electricians, construction workers, auto mechanics, and other physically demandingjobs were measured in both strength and job performance. Their strength was measured with twostandard strength tests, one for arm strength (ARM), and one for grip strength (GRIP). For jobperformance, they were first evaluated by their employer and given a score (RATINGS), and alsoevaluated by simulating using a wrench (SIMS). Data and story fromhttp://www.ruf.rice.edu/~lane/case_studies/physical_strength/index.html.43.1 Graphical: ScatterplotWe can look at each variable individually by looking at stem-leaf plots, histograms, and boxplots.To look at two variables together, we need a new to ol, called the scatterplot. It simply puts thevariables on the two axes, and plots each data point. This helps us to determine how the twovariables are related. Are they linearly related, or is there a more complex pattern? If linear, isthe relationship positive or negative? How strong is the relationship?Below are scatterplots for ARM and GRIP, RATINGS, and SIMS.●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●20 60 10050 100 150ARMGRIP●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●20 60 10025 35 45 55ARMRATINGS●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●20 60 100−4 0 2 4ARMSIMSIs arm strength related to grip strength? To the rating of the employer? To their score on thesimulation? How, and how strongly?3.2 Numerical: CorrelationA numerical way to measure the degree of linear relationship between two


View Full Document

U of M STAT 4101 - Discrete Variables

Documents in this Course
Load more
Download Discrete Variables
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Discrete Variables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Discrete Variables 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?