Statistical Methods for BCS 9.07 9/8/2004BCS 9.07 • MWF 10-11am • Instructor: Ruth Rosenholtz – Office hours W 3-4 pm • Textbook: – Basic Statistics for the Behavioral Sciences, Gary W. Heiman, 4th edition – There will likely also be outside readings, available on the MIT server• PowerPoint lectures will generally appear by 9 pm the day before lecture. – This is contingent on class attendance! • TA/instructor contact info, tentative schedule, homework, and extra handouts will also appear on the MIT server. – Your first homework is there now. Due Friday next week. – You can turn in homework on the MIT server, but are not required to do so.Grading • Class participation 5% • Homework* 40% • Mid-term 20% • Final 35% *We’ll be using MATLAB for much of the homework. It is available on the MIT server cluster machines. Please turn in a copy of your MATLAB code with eachassignment. This helps us to assign partial credit if youmake an error.Academic honesty policy • You may work with others on your homework – I.E. you may discuss your homework with other students – Write, at the top of your HW, “I worked with…” – But, each student must solve each problem themselves (including writing their own MATLAB code), and write up the solutions themselves –It is never acceptable to copy from someone else’s solution – If it seems you have copied from someone else’s solution, you will get 0 points for that problem, and we will go over the rest of your HW with a fine-tooth comb – Since it is often difficult to tell who copied from whom, don’t let anyone else copy off of your homework!Academic honesty policy • You are expected not to make use of solutions or assignments from previous years • Obviously, don’t copy off anyone else’s exam, eitherPolicy on late homework • Short extensions may be granted due if you have areasonable excuse (MIT sporting event, wedding, jobinterview, etc.), provided you notify us on or before the day the homework is assigned. – However, note that you will receive homework assignments nearly 2 weeks before they are due, so just being out of town for a day or two may not be a sufficient excuse. • For more unforeseen difficulties, an extension may also be granted, provided you get a letter from your doctor, aDean, Counseling & Support Services, or equivalent. – However, note that we may be limited in our options if we learn of your difficulty too late. Please let us know informally that there may be a problem as soon as you know of it (not 2 weeks after the assignment was due, and not at the end of the semester!).StatisticsFlipping two fair coins • What is the probability of getting two heads? – 0.5*0.5=0.25 • What is the probability of getting two tails? – 0.5*0.5=0.25 • What is the probability of getting one head and one tail? – 0.5 (head) * 0.5 (tail) + 0.5 (tail) * 0.5 (head) = 0.5 If you don’t remember how to do this, don’t worry,we’ll review probability next week.Probability density function (PDF) • Represents the true (in this case, theoretical) probability of occurrence of the set of possible events.Frequency histogram • Represents the actual frequency of occurrence of events in a sample. • Here, I flipped a pair of coins 100 times.Are my 2 coins fair coins? • Frequency histogram • PDFAre my 2 coins fair coins? • The frequency histogram doesn’t quite match the pdf. • It’s difficult to tell from the data whether this is due to a systematic factor (unfair coins) or chance (or both). This is where statistics comes in. • In this case, the coins are fair (the coin flips were generated in MATLAB). • We only expect the distribution of coin flips to look like the pdf in the long run. Not in a particular sample of 100 flips. • An outcome can differ from what is expected just by chance.Does a new drug cure cancer better than the old drug? • The data:Does a new drug cure cancer better than the old drug? • There’s an empirical difference between the old drug and the new drug, but is it due to a systematic factor (e.g. the new drug works better) or due to chance? • A related question: if we gave this drug to 100 more people, would we expect to continue to see improvement over the old drug? Do we expect this effect to generalize?Alt: Is the difference between data & theory due to systematic factors + chance, or to chance alone? • Data: • “Theory” = no difference between the drugsChance vs. systematic factors • A systematic factor is an influence that contributes a predictable advantage to a subgroup of our observations. – E.G. a longevity gain to elderly people who remain active. – E.G. a health benefit to people who take a new drug. • A chance factor is an influence that contributes haphazardly (randomly) to each observation, and is unpredictable. – E.G. measurement errorObserved effects can be due to: A. Systematic effects alone (no chance variation).– We’re interested in systematic effects, but this almost never happens! B. Chance effects alone (all chance variation). – Often occurs. Often boring because it suggests the effects we’re seeing are just random. C. Systematic effects plus chance. – Often occurs. Interesting because there’s at least some systematic factor. An important part of statistics is determining whether we’ve got case B or C.Systematic + chance vs. chance alone • Likely systematic + • Likely due to chance chance variation: alone:No chance variation On a scale from 1 to 10, rate your experience at MIT so far: 7, 7, 7, 7, 7, 7, 7, 7, 7, 7… No chance variation is like when Robin Hood shoots his second arrow in exactly the same place as his first, so the second one splits the first arrowdown the middle!We have a natural tendency to over- estimate the influence of systematic factors • How well a baseball player does in a given at-bat depends on both chance and the skill of the batter. How much of each? • I.E. there’s some amount of variation from at-bat to at-bat, regardless of whether it’s a new batter or the same batter trying again. What percent of total variation is accounted for by changing batters? – True: <0.5% of the variation is due to differences in skill of different batters. – But baseball fans estimated about 25% was due to differences in skill. (Abelson, 1985)We have a natural tendency to over-
View Full Document