Why Statistics Data data everywhere Statistical techniques are used to make many decisions that affect our lives Definition of Statistics The science of collecting organizing presenting analyzing and interpreting data to assist in making more effective decisions Statistical analysis used to manipulate summarize and investigate data so that useful decision making information results Statistics Means Never Having to Say You re Certain There are three kinds of lies lies damned lies and statistics Attributed to British Prime Minister Benjamin Disraeli A Brief History of Statistics A systematic collection of data on the population and the economy was begun in the Italian city states of Venice and Florence during the Renaissance The term statistics derived from the word state was used to refer to a collection of facts of interest to the state In 1662 the English tradesman John Graunt published a book entitled Natural and Political Observations Made upon the Bills of Mortality Graunt used the London bills of mortality to estimate the city s population in 1660 Total Deaths in England Year Burials Plague Deaths 1592 1593 1603 1625 1636 11 503 10 662 30 561 35 417 10 400 25 886 17 844 37 294 51 758 23 359 John Graunt 1662 To estimate population Graunt surveyed households in certain London parishes and discovered that on average there were approximately 3 deaths for every 88 people Since the London bills cited 13 200 deaths in London for that year Graunt estimated the London population to be about 13 200 X 88 3 387 200 John Graunt s Survey Why is understanding the BS of Statistics important How does statistics truthful or not matter for big data AI ML Being literate today means not just being able to read but being able to understand the massive amounts of data thrown at us every day and being used to train our algorithms How to Mislead Sampling through Poor Collection of Data In order to analyze and interpret data we must first collect it The data that is collected is known as a sample The sample is collected from a population We want to analyze the change in the interest level of high school students that might be interested in computing Our population was all high school students in the US Our sample was response from high school seniors in a high income area in Georgia over the last 10 years Computer Science Interest Sampling Can we claim that our results were representative of All high school students All high school seniors All high school seniors in the US All high school seniors in Georgia That would be called Biased Sampling And we could use it to lie cheat manipulate or mislead the general public Computer Science Interest Sampling How to Mislead Analysis through Poor Analyzing Data Data analysis is a process of gathering modeling and transforming data with the goal of highlighting useful information suggesting conclusions and supporting decision making Graph A Graph B Unemployment Rates Unemployment Rates 10 8 6 4 2 0 t n e c r e P 7 5 7 0 6 5 6 0 5 5 5 0 4 5 2010 2011 2012 2013 2014 2015 2016 2017 2018 2010 2011 2012 2013 2014 2015 2016 2017 2018 Year Year How to Mislead through Poor Analysis The Department of Labor surveys 60 000 households Each household is asked a series of questions YES YES Are you currently working Note No mention of part time or full time Have you looked for a job in the past 30 days No No You are not in the labor force e g 93 Million You are employed e g 147 Million You are unemployed e g 9 Million Unemployed Labor Force 9 Unemployment Rate UR 147 9 057 5 7 https www bls gov cps cps htgm htm Over the equivalent time period the Department of Labor surveys 400 000 businesses and asks a different question Are employees currently on your payroll How many YES Total Non Farm Payrolls e g 140 Million Wait a minute is that the same data that the household survey collected Are you currently working Note No mention of part time or full time YES You are employed e g 147 Million http www nbcnews com id 15768195 ns business answer desk t who does government count employed VS The household survey includes agricultural workers self employed workers and private household workers The establishment survey does not The household survey counts people on unpaid leave as employed the establishment survey does not The household survey only counts people over the age of 16 the establishment survey is not limited by age The establishment survey will often times Double Count Jobs e g if employee quits one job and is employed at another in the same payroll period Household Survey vs Establishment Survey The two surveys track each other reasonably well but there are noticeable differences However the establishment survey is subject to fairly large revisions Revisions to Total Payroll Employment Preliminary minus Current Estimate thousands Jobs overest imat ed 1 500 1 000 500 0 500 1 000 1 500 Jobs underest imat ed 1990 1992 1994 1996 1998 2000 2002 2004 Source Bureau of Labor Statistics If we aren t careful data analysis could mislead by miscalculating trends How to Mislead through Poor Analysis How to Mislead through Interpretation Interpreting Data Interpreting data often involves displaying it in some useful way e g chart graphs etc If your goal is to lie cheat manipulate or mislead graphical displays are your best friend German economic development agency GTAI used this graph to boast that German workers are more motivated and work more hours than do workers in other EU nations Source https callingbullshit org tools tools misleading axes html Bar chart axes should include zero How to Mislead with Graphical Displays This published graph creates the immediate impression that gun deaths declined sharply after stand your ground legislation was enacted in Florida In fact gun deaths actually increased by about 50 in the subsequent two years The Vertical Axis is inverted Source https callingbullshit org tools tools misleading axes html How to Mislead with Graphical Displays Year of students matriculating into CS 1 120 2 116 3 114 4 110 A faculty member argues that the number of students matriculating into computer science has dropped sharply over the past few years because the university has limited CS admissions The admission officer argues that the difference in the number of students has been roughly the same from year to year Graph A Create a graph that supports the faculty s argument Graph B Create a graph that supports the admission officer s argument Students Enrolling in
View Full Document
Unlocking...