Lecture 7 Boost Last time We started by finishing up our work on random number generation We examined two kinds of techniques popular today So called true random numbers based on a physical phenomenon and pseudo random numbers that involve some deterministic mathematical formula We then took a look at another kind of randomized trial A B testing from web design is an incredibly popular technique for optimizing the layout and operation of sites Today We will finish our example from the NY Times this time considering transformations of variables a dangling thread from our early discussions of skewed variables We will then consider a somewhat novel analysis of the 2003 California Recall Election It will be a kind of natural experiment that will let us apply our new found analysis skills An experiment at nytimes com We will now consider a more recent example of an A B test for The Travel Section of nytimes com we ll save the movie test for lab or your midterm or On the next two slides we present samples of the A and B pages the changes applied to all pages in The Travel Section so as a visitor browsed the site they would consistently see either A or B Have a look at the two designs What differences do you see in terms of layout and content What questions might the Times ask about how visitors react to these two options Another test We will look at the data in a lot more detail but to emphasize a concept we ll consider a second test that the people who provided the data were interested in Is there a difference between Tabs and Lists in terms of the number of Pageviews Let s quickly see how we can address that question Hypothesis testing Before we propose anything formal let s recall the steps 1 We begin with a null hypothesis a plausible statement a model or scenario which may explain some pattern in a given set of data but made for the purposes of argument we also select a complementary alternative hypothesis 2 We then define a test statistic some quantity calculated from our data that is used to evaluate how compatible the results are with those expected under the null hypothesis 3 We specify a threshold or significance level of the test at the end of the experiment this threshold will be applied to determine if we can reject the null 4 We then consider the distribution of the test statistic under the null hypothesis we can get at it either with some probability calculation remember the table fun from last time or through computer simulation 5 And finally after the data are collected we compute the P value and apply the threshold if our P value is less than we reject the null finding that the data contain evidence for the alternative if not we say that we cannot reject the null and that the data do not contain sufficient evidence for the alternative Page views per visit We will work with the absolute value of the difference between the average Page Views per visit computed for each group we will use the absolute value to indicate that we are looking for a big difference in either direction When judging how extreme our data are we will consider how likely it is to see an absolute difference as big or bigger than the one we see in our experimental data if the null distribution is true The difference in averages is a reasonable metric for capturing a shift in one group or the other the average itself is something that is focused on in the science of web traffic which also recommends it Page views per visit With all that set let s look at the data we collected mean travel Pageviews travel Variation Tabs 1 1 997261 mean travel Pageviews travel Variation List 1 1 980060 The mean number of Pages viewed per visit for Lists is 1 980 while it is 1 997 for the Tabs option the absolute difference 0 017 is well tiny and as a practical matter it might not amount to anything important although as we have said small differences multiplied over millions of visits might prove to be important Page views per visit Finally we need to come up with some way to evaluate the distribution of our test statistic under the null again the null is that there is no difference between Tabs and Lists If there really is no difference then the value of 0 017 we saw is simply the result of the randomization that took place on the web server So if the labels really have nothing to do with the number of Page Views per visit then we can simulate other values of the test statistic under the null by simply reassigning visitors to treatment and control to Tab and Lists Again we re randomize Page views per visit On the next slide we present the results of re randomization here we plot the difference between the average Page Views per visit in the List group and the average Page Views per visit in the Tab group Our test statistic was really the absolute value of the difference but we present the signed value before the absolute value to correspond with what we have been doing so far Our observed value is 0 017 and for a difference in our null distribution to be more extreme it has to have an absolute value of 0 017 or greater meaning 0 017 or smaller and 0 017 or larger 100 50 0 Frequency 150 200 histogram of diffferences tabs list in average pv visit 1000 re randomizations 0 06 0 04 0 02 0 00 differences tabs list 0 02 0 04 0 06 100 50 0 Frequency 150 200 histogram of diffferences tabs list in average pv visit 1000 re randomizations 0 06 0 04 0 02 0 00 differences tabs list 0 02 0 04 0 06 Page views per visit In this case we don t even have to be very formal about the fact that the observed difference of 0 017 is well within the null distribution meaning any difference in mean we observed looks like it could be the result of our randomization process Formally we would consider the proportion of tables having an absolute difference as large or larger than 0 017 That turns out to be 0 32 or 32 of our 1 000 re randomized tables That means we cannot reject the null hypothesis that Tabs and List are performing differently in terms of Pageviews per visit The data Testing is becoming a little tedious so let s now move on to have a look at the rest of the data We will have one more test for you to perform in lab but for now let s consider some other topics So let s have a look at visit lengths As you see on the next page no matter how tightly we restrict the x axis we aren t getting a lot of new information primarily we see a large …
View Full Document