SJFC MSTI 130 - Using Models to Interpret Data

Unformatted text preview:

Chapter 3Using Models to Interpret Data1What is this chapter about? It’s about taking data, possibly thousands of numbers, andfinding a few measures (values) that help you make sense of the data and represent iteffectively. The main tools you will use are the mean, the standard deviation, and PivotTables (an Excel feature). The mean turns out to be the simplest and most commonly usedmodel of data. The standard deviation can be thought of as a measure for how closelythis model fits the data (or equivalently, how appropriate the mean is in modeling the data).Thus, we have the two basic pieces of a model: the model itself (the mean) and a measureof how well the model fits (standard deviation). Another way to think about this processis that we are taking a huge amount of information (the original data) and compressing it,reducing it to fewer pieces of information that give us a sense of the entire data set. Ofcourse, we lose some of the information in the process, but we gain efficiency and a way ofcommunicating and making decisions that would be extremely difficult using only the dataitself. In this sense, the mean is the simplest possible model we can produce: we take all ofthe numerical data, no matter how numerous, and reduce it to one number for each of thenumerical variables in the data set. In order to evaluate the quality of this model for eachvariable, we then compute the standard deviation of that variable.Section 3.1 (page 63) of the chapter shows you how to use the mean as a model for thedata, and how the standard deviation is a measure of how well this model represents thedata. Section 3.2 (page 81) of the chapter shows you how to reduce data that has severalvariables, some of which are categorical, to several means using an Excel feature called PivotTables.• As a result of this chapter, students will learn√What a mean is and how it can be used to model the average or typical data point√How to use the standard deviation as a tool for determining how well the meanrepresents the data√What pivot tables are and how they are useful• As a result of this chapter, students will be able to√Compute means and standard deviations by hand, with Excel, and with add-inslike StatPro1c2011 Kris H. Green and W. Allen Emerson6162 CHAPTER 3. USING MODELS TO INTERPRET DATA√Make a Pivot Table that cross-sections your data in order to help you analyze it3.1. THE MEAN AS A MODEL 633.1 The Mean As A ModelConsider what we have so far: a lot of information in the form of spreadsheets filled withdata that we arranged into variables and observations. But what do we do with all this?Unless you’re really special, you probably can’t learn a lot from looking at a list of onethousand numbers. You probably know even less from looking at a thousand observationsfor each of four different variables. Sets of data in business and science are usually largerthan this, so we need to think of a more efficient analysis tool. The tool we will use is tobuild a model of the data. A model is a number or formula that represents a set of data -it is not the data itself, but is meant to capture certain important features of the data thatwould otherwise not be recognizable in a long list of numbers.Using models help us to understand or simplify a situation. They can also help usmake predictions about future events. For example, weather models help us analyze currentweather and predict potential future weather patterns. Architecutural models help us visu-alize the design of a building before we commit it to bricks-and-mortar. In this section wewill deal with what is possibly the simplest and most widely used model, called the meanof a set of data. Other commonly used models are given by graphs and equations, which wewill develop in future chapters, eventually having models that include all sorts of features,like categorical variables.Rather than look at the entire set of data, we want to look at the data one variable ata time in order to find out what that one variable tells us about the situation about whichwe collected data. To make things even easier, we want to reduce the data down to onenumber that represents the typical data point for that variable. In general, a number usedto represent an entire variable is called a statistic. If that statistic is meant to representthe typical data point, we call it an average.Let’s look at an example. Shown below are the fat and protein counts for 10 of the mostpopular sandwiches sold at Beef n’ Buns.Item TotalFat ProteinSuper Burger 39 29Super Burger w/ cheese 47 34Double Super Burger 57 48Double Super Burger w/ Cheese 65 53Hamburger 14 18Cheeseburger 18 20Double Hamburger 26 31Double Cheeseburger 34 35Double Cheeseburger w/ Bacon 37 38Veggie Burger 10 14We can reduce all this data down to the following simplistic model, telling us that the”typical” sandwich has 34.7 grams of fat and 32 grams of protein.Statistic Total Fat ProteinMean (g) 34.7 32.064 CHAPTER 3. USING MODELS TO INTERPRET DATAThe question we should ask ourselves is how well does the mean represent a given set ofdata. Looking at the data above, we see that although the typical sandwich has 34.7 gramsof fat, there are some that have much higher values than that and some that have much less.The first step in getting an overall measure for how the data values differ from themean is to develop a standardized ruler to measure how close the observations are to oneanother. For example, in a crowd of people, your arm-length is a good measuring stick for”closeness”: If someone is less than one arm-length away from you, you would consider them”close”. However, this distance is not appropriate when driving down the freeway. A moreappropriate measuring stick for this situation would be the length of a car. The FederalAviation Administration has yet another definition of close: aircraft are not allowed within1000 feet of each other without declaring a ”near miss.”These situations all describe ways of measuring ”closeness” that refer to real physicaldistances. Seldom, however, do managers deal with these kinds of distances. More commonly,they collect data measured in dollars or years. Can we find a way to measure distance thatwill make sense for almost any situation that managers encounter?As you’ve probably guessed, we can. To do so, however, we need to decide where to startmeasuring from. Most of the time we start measuring at zero, but this may not help verymuch when looking at


View Full Document

SJFC MSTI 130 - Using Models to Interpret Data

Download Using Models to Interpret Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Using Models to Interpret Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Using Models to Interpret Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?