Unformatted text preview:

PSY 394U Do It Yourself Statistics Counting data Thus far we have encountered data that were the results of measurements made on some process Often however data take the form of counts of things where a thing could be a quality category or any similar non numeric division For example we might count the number of dendritic branchings that occurred on each of 5 types of growth media The data of branchings might look like the following Medium of branchings A 12 B 15 C 20 D 19 E 26 Obviously we probably have scientific reasons for believing that one or more growth media will do better than the others but a statistical analysis allows us to compute the probability that the observed pattern of results could be due to chance alone The total number of branchings is N 92 so intuitively we would expect about 92 5 branchings in each group if chance alone were operating Of course we won t always get 92 5 branchings per group because of the very chance fluctuations with which we are concerned in fact we will never get 92 5 branchings because 92 5 18 4 and counts by definition are integers In each group we observe Og branches and hence N Og non branches If chance alone is operating and all growth media are equally supportive the number of counts Og in each cell on a given experiment would be a random number from a binomial distribution with N 42 92 and p 1 5 To simulate how 92 branches would spread over 5 growth media by chance alone we will generate 92 uniformly distributed random numbers and then count how many numbers are in each of 5 even intervals between 0 and 1 branchloc rand 92 1 simcounts 1 sum branchloc 1 5 simcounts 2 sum branchloc 1 5 branchloc 2 5 simcounts 3 sum branchloc 2 5 branchloc 3 5 simcounts 4 sum branchloc 3 5 branchloc 4 5 simcounts 5 sum branchloc 4 5 Or alternatively branchloc rand 92 1 ngroup 5 for g 1 ngroup simcounts g sum branchloc g 1 ngroup branchloc g ngroup end PSY 394U Do It Yourself Statistics That s it We now have simulated data from an experiment in which chance alone is distributing the number of branches among cells given 92 total branchings To do a Monte Carlo simulation of this world in which chance alone is operating we need merely repeat this several times over But what we need now is some test statistic that we can compute on each simulated experiment to develop a sampling distribution and that we can also compute on our real data and then see where the statistic for the data falls relative to the sampling distribution We know that the expected value in each cell is E N 1 5 18 4 so a reasonable statistic could be formed by computing the difference between each cell count real or simulated and E squaring it so the negative differences don t cancel out the positive ones and summing this up to compute a sum squared error which by this time should be very familiar to us An implementation in MATLAB is ngroup 5 totCount 92 E totCount ngroup nrep 1000 simcounts zeros ngroup 1 simerrs zeros nrep 1 for i 1 nrep branchloc rand totCount 1 for g 1 ngroup simcounts g sum branchloc g 1 ngroup branchloc g ngroup end simerrs i sum simcounts E 2 end realcounts 12 15 20 19 26 realerr sum realcounts E 2 hist simerrs line realerr realerr 0 300 p sum simerrs realerr nrep disp p The resulting plot and p value that we got is below PSY 394U Do It Yourself Statistics Figure 7 1 Sampling distribution of sum square error metric under the null assumption of equally supportive growth media The observed error 113 is well within the distribution Let s return to our error metric which is simply a sum squared difference If we give it the arbitrary name zeta it would be written as mathematically as Oi is the number of observed events branchings in this case in each group N is the total number of events and k is the number of groups Cleary the distribution of this metric is going to change with both the number of groups k the more groups we sum over the bigger we expect the metric to be overall and the total number of things we are counting N Intuitively if we are counting how many times a neuron spiked in one of two conditions over a 10 sec interval and the total number of spikes was 3000 then a discrepancy of 10 spikes between observed and expected would be relatively small If on the other hand we were counting the number of times a rat chose to press each of two PSY 394U Do It Yourself Statistics possible levers across 20 total trials then a discrepancy of 10 between observed and expected would be the maximum discrepancy possible We can modify our metric above very easily to standardize across all possible values of N by dividing the squared difference by the expected value and in so doing we yield the traditional chi squared goodness of fit test The reason that the expected value and not its square or the binomial variance N 1 k 1 1 k etc is shown for the special case of k 2 in the appendix Another common use of the chi squared test is to assess the independence of two variables Let us introduce this use with a somewhat whimsical example Suppose a street entertainer approaches you and offers you the following wager He has two decks of 40 cards each One deck consists of 20 Kings and 20 Jokers while the other consists of 20 Queens and 20 Jokers You will pay him a dollar to play and he will deal out 40 2card hands one card from each deck If there are more than 10 royal couples a King and a Queen he keeps the dollar If on the other hand there are 10 or fewer royal couples he gives you two dollars Assuming the game is fair we can easily figure out what we should expect to happen on average For each hand there should be a 0 5 probability of getting a King from the first deck and a 0 5 probability of getting a Queen from the second deck Thus if the two cards for a given hand and truly drawn independently then the probability of getting the hand K Q is p K p Q 0 5 0 5 0 25 Over 40 hands then the expected value for the number of occurrences of K Q is 0 25 40 10 This situation is summarized below in a contingency table The first deck is represented in the rows and the second deck is represented in the columns such that each possible hand or contingency corresponds to one of the central four cells Entered in these cells are the expected values or most likely outcomes for each hand Deck 2 Deck 1 K J Totals Q …


View Full Document

UT PSY 394U - Counting Data

Documents in this Course
Roadmap

Roadmap

6 pages

Load more
Loading Unlocking...
Login

Join to view Counting Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Counting Data and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?