DOC PREVIEW
MIT 6 00 - Transcript – Lecture 23

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 6.00 Introduction to Computer Science and Programming, Fall 2008 Please use the following citation format: Eric Grimson and John Guttag, 6.00 Introduction to Computer Science and Programming, Fall 2008. (Massachusetts Institute of Technology: MIT OpenCourseWare). http://ocw.mit.edu (accessed MM DD, YYYY). License: Creative Commons Attribution-Noncommercial-Share Alike. Note: Please use the actual date you accessed this material in your citation. For more information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/termsMIT OpenCourseWare http://ocw.mit.edu 6.00 Introduction to Computer Science and Programming, Fall 2008 Transcript – Lecture 23 OPERATOR:: The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: I want to pick up exactly where I left off last time. When I was talking about various sins one can commit with statistics. And I had been talking about the sin of data enhancement, where the basic idea there is, you take a piece of data, and you read much more into it than it implies. In particular, a very common thing people do with data is they extrapolate. I'd given you a couple of examples. In the real world, it's often not desirable to say that I have a point here, and a point here, therefore the next point will surely be here. And we can just extrapolate in a straight line. We before saw some examples where I had an algorithm to generate points, and we fit a curve to it, used the curve to predict future points, and discovered it was nowhere close. Unfortunately, we often see people do this sort of thing. One of my favorite stories is, William Ruckelshaus, who was head of the Environmental Protection Agency in the early 1970s. And he had a press conference, spoke about the increased use of cars, and the decreased amount of carpooling. He was trying to get people to carpool, since at the time carpooling was on the way down, and I now quote, "each car entering the central city, sorry, in 1960," he said, "each car entering the central city had 1.7 people in it. By 1970. this had dropped to less than 1.2. If present trends continue, by 1980, more than 1 out of every 10 cars entering the city will have no driver." Amazingly enough, the press reported this as a straight story, and talked about how we would be dramatically dropping. Of course, as it happened, it didn't occur. But it's just an example of, how much trouble you can get into by extrapolating. The final sin I want to talk about is probably the most common, and it's called the Texas sharpshooter fallacy. Now before I get into that, are any of you here from Texas? All right, you're going to be offended. Let me think, OK, anybody here from Oklahoma? You'll like it. I'll dump on Oklahoma, it will be much better then. We'll talk about the Oklahoma sharpshooter fallacy. We won't talk about the BCS rankings, though. So the idea here is a pretty simple one. This is a famous marksman who fires his gun randomly at the side of a barn, has a bunch of holes in it, then goes and takes a can of paint and draws bullseyes around all the places his bullets happened to hit. And people walk by the barn and say, God, he is good. So obviously, not a good thing, but amazingly easy to fall into this trap. So here's another example. In August of 2001, a paper which people took seriously appeared in a moderately serious journal called The New Scientist. And it announced that researchers in Scotland had proven that anorexics are likely to have been bornin June. I'm sure you all knew that. How did how did they prove this? Or demonstrate this? They studied 446 women. Each of whom had been diagnosed anorexic. And they observed that about 30 percent more than average were born in June. Now, since the monthly average of births, if you divide this by 12, it's about 37, that tells us that 48 were born in June. So at first sight, this seems significant, and in fact if you run tests, and ask what's the likelihood of that many more being born in 1 month, you'll find that it's quite unlikely. In fact, you'll find the probability of this happening is only about 3 percent, of it happening just by accident. What's wrong with the logic here? Yes? STUDENT: They only studied diagnosed anorexics. PROFESSOR: No, because they were only interested in the question of when are anorexics born, so it made sense to only study those. Now maybe you're right, that we could study that, in fact, more people are born in June period. That could be true. This would be one of the fallacies we looked at before, right? That there's a lurking variable which is just that people are more likely to be born in June. So that's certainly a possibility. What else? What else is the flaw? Where's the flaw in this logic? Well, what did they do? They participated in the Oklahoma sharpshooter fallacy. What they did is, they looked at 12 months, they took the months with the most births in it, which happened to be June, and calculated the probability of 3 percent. They didn't start with the hypothesis that it was June. They started with 12 months, and then they drew a bullseye around June. So the right question to ask is, what's the probability, not that June had 48 babies, but that at least one of the 12 months had 48 babies. That probability is a lot to higher than 3 percent, right? In fact, it's about 30 percent. So what we see is, again perfectly reasonable statistical techniques, but not looking at things in the right way. And answering the wrong question. That make sense to everybody? And you can see why people can fall into this trap, right? It was a perfectly sensible, seemingly sensible argument. So the moral of this particular thing is, be very careful about looking at your data, drawing a conclusion, and then saying how probable was that to have occurred? Because again, you're probably, or maybe, drawing the bullseye around something that's already there. Now if they had taken another set of 446 anorexics, and again June was the month, then there would be some credibility in it. Because they would have started with the hypothesis, not that there existed a month, but that June was particularly likely. But then they would have to also check and make sure that June isn't


View Full Document

MIT 6 00 - Transcript – Lecture 23

Download Transcript – Lecture 23
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Transcript – Lecture 23 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Transcript – Lecture 23 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?