SJFC MSTI 130 - Understanding the Role of Data

Unformatted text preview:

Chapter 2Understanding the Role of Data1Quantifying the world is often a bit more involved than simply determining how much thereis of variable A or how many there are of variable B. The complication: ”it depends.” Theremay be other variables C or D that need to be taken into consideration. For example,suppose you are the CEO of a large company and you want data on the salaries of youremployees in order to ensure fairness and equity, provide incentives, control costs, and yetkeep your company competitive. A simple approach: How much does employee 23 earn?employee 24? Etc. This is certainly useful data to have at hand–you know how much ofvariable A and how many of variable B. But that is not enough. As CEO, it would be muchmore useful for you to know, in addition, the employee’s department, years of experience atthe company, job grade, educational level, age, and gender. What you really want to knowis how much of A and how many of B broken down by categories C, D, E, F, G, and H.Quantifying the world, then, does not necessarily mean thinking of the world in terms ofnumbers only, but also in terms of categories. We will learn how to distinguish and classifyvarious kinds of variable data in the first section of the chapter. In the second section, wewill practice coding these differing data in an EXCEL spreadsheet.• As a result of this chapter, students will learn√The differences between numerical and categorical data√The importance of attending to units and categories√How to extract data from a problem situation√The purpose of identifiers in a data set• As a result of this chapter, students will be able to√Design data collection forms√Code numerical and categorical data from a data collection form√Set up an Excel spreadsheet√Correctly enter data into an Excel spreadsheet√Properly define the required variable names in Excel√Insert comments about data variables1c2011 Kris H. Green and W. Allen Emerson3334 CHAPTER 2. THE ROLE OF DATA2.1 Extracting Data from the Problem SituationIn the previous chapter we learned how to define a problem. We recognized that a real-worldproblem is often embedded in an interconnected web of events taking place in time and spaceusually involving people, objects, or machines. To gather meaningful data about a problemwe must think of how the data is related to its surroundings. For example, in order to gatherthe kind of data that we can use to identify and then correct excessive wait times at Beefn’ Buns, we need to consider when a ”wait time” begins and when it ends and then connectthese wait times to the types of orders being filled during these wait times because not allorders are created equal with regard to wait times.In order to gather the kinds of data that we can use to identify and then correct excessivewait times, we need to understand why not all orders are created equal with regard to waittimes. And one of the first things that we recognize as we try to understand this connectionis that there seems to be an inherent difference between wait-time data and type-of-orderdata. In this section we move ahead by learning how to recognize different types of data ina problem situation and how to record them on data collection forms. This is the process ofextracting data from the problem situation.Before we can complete the data extraction process by recording the data on data col-lection forms, we need to know exactly what type of data we are recording in order to knoweither ”how many of what” to mark down or what category to check, depending on whetherthe data is numerical or categorical.Types of DataAs we mentioned above, not all data has to do with numbers. Data that does have to do withnumbers, that is, counting or measuring something, is called numerical data and that whichhas to do with classification or categorizing something is called categorical data. Examplesof numerical data are salaries, sales, heights, weights, number of customers, number ofchildren. Examples of categorical data are gender (male, female), job classifications (e.g.office staff, management, vice president), day of week, marriage status. Sometimes it isobvious what type of data we are dealing with in a particular problem situation; other timeswe have to make a conscious decision as to whether we want to record our data numerically orcategorically. In the latter case, we have to ask ourselves if it would be more beneficial for ouranalysis to retain the numerical differences between the individual things we are observingor whether it would be better to group them into categories. Each has its advantages.Almost any type of numerical data can be converted into categorical data by some sortof classification scheme. For example, individual numerical heights could be lumped intoshort, medium, tall, and very tall categories by some sort of scheme, such as, all heightsbelow 60 inches will be placed in the ”short” category, all heights between 60 inches and 68inches will be placed in the ”medium” category, etc. Categorical data, however, cannot beconverted to numerical data, however. Take, for example, the gender categorical data. Itwould not make sense to find the add-up-and-divide average of the categories ”female” and”male” even if we decided to think of a female as ”0” and a male as ”1.” It would make nosense to talk about (0+1)/2 or .5 as gender. In general, we can distinguish numerical andcategorical data by this rule of thumb: if you can do meaningful arithmetic with the data,2.1. EXTRACTING DATA FROM THE PROBLEM SITUATION 35it is numerical; if not, it is categorical.When coding data, note that numbers can be used as codes for categorical data: E.g. 0for male, 1 for female or 1-5 in opinion poll rankings. Without prior knowledge or providedinformation, it is often difficult to distinguish between numerical and categorical data: E.g.Age: 59, 52, 58, 12, 43, 23. This data could either be numerical or categorical, depending onthe purpose and design of the study. That is, if it were to be considered numerical, 59 wouldhave a different impact on the sum of all the ages, for instance, than would 52, whereas ifage were considered to be categorical data, then both 59 and 52 might be lumped into the”middle-aged” category, whereas 70 and 80 might be counted in the ”senior” category.Each type of data, numerical and categorical, has two subtypes. Numerical data can beeither discrete or continuous and categorical data


View Full Document

SJFC MSTI 130 - Understanding the Role of Data

Download Understanding the Role of Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Understanding the Role of Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Understanding the Role of Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?