StatisticsVariablePopulationSampleSimple Random SampleStatisticsThe science in which inferences are made about specific random phenomena on the basis of relatively limited sample material. Biostatistics—analysis of medical, biological and public health data. The methods are applicable to epidemiology, policy, community, environmental health and occupational health. Descriptive statistics—summarize information Inferential statistics—make a decisionVariableA characteristic that takes on different values for different persons, places, or things. a. Qualitative – categorized only b. Quantitative – can be measuredc. Random – value arises as a result of chancefactorsd. Discrete – takes only certain values. There are “gaps” between the values. Discrete variables that can take on only two values are called dichotomous. e. Continuous – infinite number of possible values. There are no “gaps”.PopulationThe population is the largest collection of entities in which we are interested and that have a common observable characteristic. Generally we think of a population as made up of persons or things. We can also think of a population as the observable characteristic, e.g., blood cholesterol levels of adults living in New Orleans or the ABO blood group of the students at TSPHTM. Some populations are finite while others are infinite. A parameter is a characteristic of a population.SampleA sample is a part of a population. A statistic is a measure from a sample and used to estimate aparameter. Generally, values of parameters are not available. Greek letters—parameter Roman letters—statistic. Sometimes we will represent a statistic using the Roman letter with a “hat” over the letter; e.g., if μ = mean (parameter), ˆ will represent the statistic from a sample.Sources of Data1. Routine records2. Surveys3. Experiments4. External sourcesMeasurement ScalesAssign numbers, letters, words, etc. to convey information about a characteristic under study. 1. Nominal – putting into non-overlapping categories—classification based on a qualitative assessment—no information regarding quantity oramount—e.g., gender, ABO blood group, country of birth—code (number) used is an indicator or place holder2. Ordinal – like nominal except includes the concept of “greater than” or “less than”—ranking—e.g., grade in school, “how do you feel today”3. Interval – ordinal with concept of distance but no true zero point—0 is just another point on the scale—how much more or how much less—e.g., temperature in oF or oC4. Ratio – true zero point with a physical significance—e.g., blood pressure level, height, weightSimple Random SampleEvery possible observation in the population has an equal chance of being selected for the sample. 1. With replacement2. Without replacementMeasures of Central Tendency1. Mean—averagea. Propertiesi. Uniqueii. Simpleiii. Same units as original observationsiv. Useful for inferential statisticsb. FormulaLet x1, x2, …, xn be a sample of datan1iixn1xPractice with ΣLet X1 = 6, X2 = 3, X3 = -131ii31i2i32ii31ii2XXXXor n1iin1iiXccX where c is a constantc. Disadvantage – extreme values are a problemd. Used for ratio or interval data2. Median—middle value—value that divides the dataset into two equal partsa. Propertiesi. Uniqueii. Simpleiii. Less affected by extreme valuesiv. Same units as original observations.b. Formulai. Order the observations from smallest to largestii.oddisnifn,observatio21nthiii.evenisnifns,observatio12nand2nofaveragethth3. Comparison of Mean and Mediani. Distribution is symmetricii. Distribution is positively skewediii. Distribution is negatively skewed3. Mode—value that occurs most oftena. Propertiesi. Useful for qualitative dataii. Not unique or not necessarily in the “center”iv. Not stable for small samplesv. For nominal or ordinal data, categorywhich occurs most oftenExample: Weights of animals (kg)13.2 14.415.4 13.613.0 15.016.6 14.616.9 13.1Mean:14.5810145.8nxxn1iiMedian: Ordered array13.0 14.613.1 15.013.2 15.413.6 16.614.4 16.9Median = average of 5th and 6th observations=(14.4 + 14.6)/2 = 14.5Mode: No modeAdditional Properties of the MeanLet x1, x2, …, xn be a sample – original sampleLet x1+c, x2+c, …, xn+c – translated sampleDef: yi = xi + c, i = 1, …, ncxy Original sample Translated sample-9 1-7 3-3 7-2 8-1 90 102 124 14 = -16 = 642x 102cx8864y Let x1, x2, …, xn be a sample – original samplecx1, cx2, … , cxn – scaled sampleDef: yi = cxi, i = 1, … , nxcy Original sample Scaled sample0.013 130.024 240.010 100.009 90.014 14 = 0.070 = 700.014x x*c0.014*100014y We can do both operations in the same problem:Yi = c1x + c2, i = 1, … , n21cxcy Let x = average temperature in oC 32x59y = average temperature in oFIf x = 12, then 321259y = 53.6
View Full Document