Unformatted text preview:

Chapter 6Interpreting Spatial Models1This chapter aims to do two things. Part A focuses on how to estimate statistics, particularlythe mean and standard deviation, from data that is only presented in summary form (like afrequency table or a histogram). Part B takes this one step further, by helping you connecttwo different ways of picturing data by relating histograms and boxplots. Both give a pictureof how the data is spread out. The difference is that a boxplot takes the data and breaks itinto four chunks with the same number of observations in each chunk, but with each chunkof data having a different length. Histograms are the opposite: each chunk has exactly thesame length, but probably has different numbers of observations in it.• As a result of this chapter, students will learn√Why summarized data cannot be used to compute an accurate mean or standarddeviation√What a percentile is√What a cumulative distribution is• As a result of this chapter, students will be able to√Estimate the mean from a set of summarized data√Estimate the standard deviation from a set of summarized data√Sketch a boxplot of the data underlying a histogram without having the dataitself√Sketch a rough idea of a histogram of data based only on a boxplot of the data1c2011 Kris H. Green and W. Allen Emerson163164 CHAPTER 6. INTERPRETING SPATIAL MODELS6.1 Estimating Stats from Frequency DataMany times we are presented with data, in newspapers, magazines, the Internet, or meetings,but these data are rarely presented in its entirety. After all, in many cases, there are thou-sands of observations of each variable. It is therefore more common to present summarizeddata in the form of tables or charts that show the number (or frequency) of observationsthat fall into a certain range (or bin). In the last chapter, we used this idea to create agraphical depiction of the data in the form of a histogram. But what if you are starting fromthe summarized data and what to know something about the original data itself?For example, what if you wish to compute the mean of the data? This is the mostfrequently used measure of central tendency and is often used a model of the data. Theway in which we compute this measure of central tendency is based on having all of theindividual data points in the set of data. In a summarized table of data, though, we donot have the actual values to add up. One thing is certain; we cannot simply average thefrequency counts, as this does nothing to account for the actual values of the data and thefrequency counts are not (usually) even in the same units as the data itself. For example, inlooking at the table below, we see data on salary distribution at a company. If we averagethe frequency counts (labeled ”Number of Employees”) we get 11.8, which means that ifthe distribution were uniform, there would be 11.8 employees in each salary range. Butthis number has units of number of people. The average salary must have units of dollars.Somehow, we must estimate the mean based on both the salary ranges and the number ofobservations in that range.Salary Range Number of Employees$200,000 - $250,000 1$150,000 - $199,999 2$100,000 - $149,999 5$50,000 - $99,999 13$0 - $49,999 38Unfortunately, as we’ll discover, once you have only the summarized data, there is noway to get the actual mean of the original data. At best, you are estimating the mean, andyour estimate has a great deal of possible error, depending on the size (width) of each bininto which the data has been summarized. These same ideas hold true for estimating thestandard deviation of the data, especially since we must first estimate the mean in orderto compute the deviations of each observation (or, in this case, each group of observations)from the mean.And while it is true that in many cases we have the actual data and can compute thetrue mean of the data, this is often not true. Have you every filled in a customer satisfactionsurvey? Such surveys often collect demographic data, such as the age of the person fillingin the form, but rarely do they ask you to write in your age. It is more common to checkoff a box marking a range where your age fits (for example, 31-40 years old). In situationslike this, the data starts as a summarized frequency table; the company collecting the datanever has the actual ages of each survey participant. So they must resort to estimating themean if they need it for other calculations.6.1. ESTIMATING STATS FROM FREQUENCY DATA 1656.1.1 Definitions and FormulasSummarized Data Summarized data is data not presented in raw form. Instead, the datahas been grouped (or summarized) into categories. For example, rather than listingthe salaries of all 250 employees at a company, a summarized presentation of thisdata might simply tell you the number of employees in each salary range, such as 10employees making $0 - $20,000, 34 employees making $20,001 to $40,000 and so forth.Weighted Average A weighted average is a type of mean where each item to be includedin the average has a different weight depending on either its frequency or importance.One of the most common weighted averages is a student’s GPA in college. Each classis assigned a value, based on the grade (usually a number from 0 - 4 quality points)and is assigned a weight based on the number of credit hours (3 for a three creditcourse, 4 for a 4 credit course, etc.) The overall GPA is then computed by weightingeach grade (multiply the quality points by the weight [number of credit hours]), addingthese weighted grades up, and dividing by the total number of credit hours (which isjust the sum of all the weights). This means that a low grade in a high weight course(one with more credit hours) is more damaging than a low grade in a course with fewcredit hours. Another common use of weighted averages is to estimate the mean of aset of data given by a frequency table. In this case, the weight is determined by thefrequency counts. For example, if 10% of a class scored 50 on an exam, 20% scored 60,40% scored 70, 10% scored 80 and 20% scored 90, then the class average is0.10(50) + 0.20(60) + 0.40(70) + 0.10(80) + 0.20(90)0.10 + 0.20 + 0.40 + 0.10 + 0.20=711= 71.More generally, if the data are given by xiand the weights are given by wi, the weightedaverage of the data is given byWeighted Average =Pni=1wixiPni=1wiWeight Each item to be included in a weighted average is assigned a weight that identifieshow much that item contributes to the overall


View Full Document

SJFC MSTI 130 - Interpreting Spatial Models

Download Interpreting Spatial Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Interpreting Spatial Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Interpreting Spatial Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?