DOC PREVIEW
UI STAT 2010 - Statistical Methods and Computing

This preview shows page 1-2 out of 6 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1 22S 105 Statistical Methods and Computing 2 What is statistics Statistics is the science of using data to make decisions and answer questions Introduction Statistics involves designing studies collecting data organizing and analyzing data interpreting and reporting results Lecture 1 January 19 2011 Kate Cowles 374 SH 335 0727 kcowles stat uiowa edu 3 The Challenger How understanding of statistical methods might have prevented a tragedy References Dalal SR Fowlkes EB Hoadley B 1989 Risk Analysis of the Space Shuttle Pre Challenger Prediction of Failure Journal of the American Statistical Association 84 945 957 Tufte Edward R 1997 The Decision to Launch the Space Shuttle Challenger in Visual and Statistical Thinking Displays of Evidence for Making Decisions Graphics Press 4 On 1 28 86 space shuttle Challenger exploded during launch 7 astronauts killed reason gas leak through a joint that should have been sealed by two rubber O rings O rings had lost resiliency due to cold temperature 5 On the previous day extensive discussions of whether or not it would be safe to launch 6 The engineers evidence history of serious but non catastrophic O ring damage during previous cool weather launches predicted temperature for launch time 26 29o physics of resiliency of rubber no shuttle had ever been launched at temperature lower than 53o experimental data engineers who designed rocket faxed to NASA a recommendation not to launch due to risk of O ring failure at low temperatures NASA officials pointed out weaknesses of engineers evidence after lengthy discussion managers of rocket making company changed their minds and recommended launch 7 The engineers plot of data from previous shuttle launches joint temperature vs number of O rings having some temperature related problems 8 What was missing from the engineers argument quantification of the relationship between joint temperature and O ring failure prediction of the probability of O ring failure at 29o with assessment of degree of uncertainty an appropriate statistical method logistic regression Dalal et al carried out such an analysis after the fact using data from the 23 shuttle launches prior to the Challenger found strong statistical evidence of a temperature effect on O rings we will analyze these data later in the semester 9 A plot showing data from all 23 previous launches including those in which no O rings were damaged 10 Subjects observations and variables In statistical studies we generally choose a set of individuals or subjects on whom data is collected We usually are interested in collecting a number of different kinds of information to describe each subject A variable is a particular characteristic that may take on different values for different subjects For example age gender diagnosis are three variables that might be included in a study of length of hospital stays of hospital patients 11 For analysis by a computer a set of data collected for a study is often organized as a table with a row for each subject and a column for each variable 12 Types of variables Qualitative textbook calls this categorical Pat id 101 102 103 age 25 38 76 sex diagnosis F hepatitis A F cirrhosis M hepatitis C Each row in such a table corresponding to the data for a single subject is called an observation Nominal values fall into unordered categories numbers may be used to represent categories but they are just labels example variable called occupational area coded as 1 education 2 business 3 service 4 industry etc etc special case binary data which can take on only 2 possible values Ordinal data representing ordered categories example variable called prognosis taking on possible values poor fair good 13 Quantitative What data type is each of the following Discrete both order and magnitude are important numbers represent measurable quantities possible values are restricted often to be integers example count of number of homicides in Johnson County in 1998 Continuous numbers represent measurable quantities and are not restricted to a set of specified values examples temperature blood pressure annual profit Special case censored data continuous data in which values for some subjects are not observable some values are known only to be larger or smaller than some observed value example time to failure data 15 Exploratory data analysis initial examination to discover main features of data should begin with examining each variable one at a time may proceed to examining relationships between variables should begin with graphs may continue with numerical summaries 14 a variable defined for each pre Challenger shuttle launch as the answer to the question Were any primary Orings damaged during launch yes no a variable defined for each pre Challenger shuttle launch as the total number of primary O rings that were damaged out of the 6 primary O rings in a shuttle a variable defined as outdoor temperature in degrees F at launch time of each shuttle The distribution of a variables tells what values it takes and how frequently it takes them 16 Describing binary nominal and ordinal data tables of frequencies and percents bar charts also called bar graphs pie charts frequency distribution for nominal or ordinal data a set of classes or categories along with numerical counts of the number of members of each class 17 18 Example Study of nutrition in breakfast cereals 6 fat grams of fat Abstract 7 sodium milligrams of sodium This datafile contains nutritional information and grocery shelf location for 77 breakfast cereals Data was obtained from the Data and Story Library http lib stat cmu edu DASL 8 fiber grams of dietary fiber 9 carbo grams of complex carbohydrates 10 sugars grams of sugars 11 potass milligrams of potassium Variable Names 12 vitamins vitamins and minerals 0 25 or 100 indicating the typical percentage of FDA recommended 1 Name Name of cereal 2 mfr Manufacturer of cereal where A American Home Food Products G General Mills K Kelloggs N Nabisco P Post Q Quaker Oats R Ralston Purina 13 shelf display shelf 1 2 or 3 counting from the floor 3 type cold or hot 16 rating a rating of the cereals 14 weight weight in ounces of one serving 15 cups number of cups in one serving 4 calories calories per serving 5 protein grams of protein 19 The FREQ Procedure Cumulative Cumulative type Frequency Percent Frequency Percent Cold 74 96 10 74 96 10 Hot 3 3 90 77 100 00 Cumulative Cumulative mfr Frequency Percent Frequency Percent American Home 1 1 30 1 1 30 General Mills


View Full Document

UI STAT 2010 - Statistical Methods and Computing

Documents in this Course
Load more
Download Statistical Methods and Computing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Statistical Methods and Computing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Statistical Methods and Computing and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?