1 22S 105 Statistical Methods and Computing 2 What is statistics Statistics is the science of using data to make decisions and answer questions Introduction Statistics involves designing studies collecting data organizing and analyzing data interpreting and reporting results Lecture 1 January 18 2006 Kate Cowles 374 SH 335 0727 kcowles stat uiowa edu 3 The Challenger How understanding of statistical methods might have prevented a tragedy References Dalal SR Fowlkes EB Hoadley B 1989 Risk Analysis of the Space Shuttle Pre Challenger Prediction of Failure Journal of the American Statistical Association 84 945 957 Tufte Edward R 1997 The Decision to Launch the Space Shuttle Challenger in Visual and Statistical Thinking Displays of Evidence for Making Decisions Graphics Press 4 On 1 28 86 space shuttle Challenger exploded during launch 7 astronauts killed reason gas leak through a joint that should have been sealed by two rubber O rings O rings had lost resiliency due to cold temperature 5 On the previous day extensive discussions of whether or not it would be safe to launch 6 The engineers evidence history of serious but non catastrophic O ring damage during previous cool weather launches predicted temperature for launch time 26 29o physics of resiliency of rubber no shuttle had ever been launched at temperature lower than 53o experimental data engineers who designed rocket faxed to NASA a recommendation not to launch due to risk of Oring failure at low temperatures NASA officials pointed out weaknesses of engineers evidence after lengthy discussion managers of rocket making company changed their minds and recommended launch 7 The engineers plot of data from previous shuttle launches joint temperature vs number of O rings having some temperature related problems 8 What was missing from the engineers argument quantification of the relationship between joint temperature and O ring failure prediction of the probability of O ring failure at 29o with assessment of degree of uncertainty an appropriate statistical method logistic regression Dalal et al carried out such an analysis after the fact using data from the 23 shuttle launches prior to the Challenger found strong statistical evidence of a temperature effect on O rings we will analyze these data later in the semester 9 A plot showing data from all 23 previous launches including those in which no O rings were damaged 10 Subjects observations and variables In statistical studies we generally choose a set of individuals or subjects on whom data is collected We usually are interested in collecting a number of different kinds of information to describe each subject A variable is a particular characteristic that may take on different values for different subjects For example age gender diagnosis are three variables that might be included in a study of length of hospital stays of hospital patients 11 For analysis by a computer a set of data collected for a study is often organized as a table with a row for each subject and a column for each variable 12 Types of variables Qualitative textbook calls this categorical Pat id 101 102 103 age 25 38 76 sex diagnosis F hepatitis A F cirrhosis M hepatitis C Each row in such a table corresponding to the data for a single subject is called an observation Nominal values fall into unordered categories numbers may be used to represent categories but they are just labels example variable called occupational area coded as 1 education 2 business 3 service 4 industry etc etc special case binary data which can take on only 2 possible values Ordinal data representing ordered categories example variable called prognosis taking on possible values poor fair good 13 Quantitative What data type is each of the following Discrete both order and magnitude are important numbers represent measurable quantities possible values are restricted often to be integers example count of number of homicides in Johnson County in 1998 Continuous numbers represent measurable quantities and are not restricted to a set of specified values examples temperature blood pressure annual profit Special case censored data continuous data in which values for some subjects are not observable some values are known only to be larger or smaller than some observed value example time to failure data 15 Exploratory data analysis initial examination to discover main features of data should begin with examining each variable one at a time may proceed to examining relationships between variables should begin with graphs may continue with numerical summaries 14 a variable defined for each pre Challenger shuttle launch as the answer to the question Were any primary O rings damaged during launch yes no a variable defined for each pre Challenger shuttle launch as the total number of primary O rings that were damaged out of the 6 primary O rings in a shuttle a variable defined as outdoor temperature in degrees F at launch time of each shuttle The distribution of a variables tells what values it takes and how frequently it takes them 16 Describing binary nominal and ordinal data tables of frequencies and percents bar charts also called bar graphs pie charts frequency distribution for nominal or ordinal data a set of classes or categories along with numerical counts of the number of members of each class 17 18 Example New York Times New York City Poll June 2003 What was the last grade in school that you completed 1 Not a high school grad 2 High school grad 3 Some college trade or business 4 College grad 5 Post grad work or degree 6 Refused What is your sex 1 male 2 female In the last year do you think life in New York City has generally gotten better gotten worse or stayed about the same 1 Better 2 Worse 3 Same 9 DK NA How old are you How would rate the condition of the NYC economy Is it very good fairly good fairly bad or very bad 1 Very good 2 Fairly good 3 Fairly bad 4 Very bad 9 DK NA What was your income in 2002 Was it under 15 000 or between 15000 and 30000 or over 30000 etc to obtain the following breakdown 1 under 15000 2 15000 30000 3 30000 50000 4 50000 75000 5 75000 100000 6 over 100 000 7 Won t specify refused How much do you blame the terrorist attack of 9 11 for NYC s current budet problems 1 a lot 2 some 3 not much How would you describe your views on most political matters Generally do you think of yourself as 1 liberal 2 moderate 3 conservative The New York Times NEW YORK TIMES NEW YORK CITY POLL JUNE 2003 Computer file ICPSR version New
View Full Document