MIT HST 722 - Reduction of information redundancy in the ascending auditory pathway - D2132957

Home> Schools> Massachusetts Institute of Technology> (HST) > HST 722> Reduction of information redundancy in the ascending auditory pathway

DOC PREVIEW

MIT HST 722 - Reduction of information redundancy in the ascending auditory pathway

School name Massachusetts Institute of Technology

Course Hst 722- BRAIN MECHANISM FOR HEARING AND SPEECH

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Neuron, volumne 51Supplemental dataReduction of information redundancy in the as-cending auditory pathwayGal Chechik, Michael J. Anderson, Omer Bar-Yosef, Eric D. Young, Naftali Tishbyand Israel NelkenThe stimulus setStimuli were natural and modified versions of sounds from the Cornel Laboratory of Ornithologyc. For detailed methods, see (Bar-Yosef, Rotman, & Nelken, 2002). All stimuli had a similarfrequency distribution, with a peak near 4 kHz, duration of about 100 ms and the fluctuations inthe temporal envelope were below 100 Hz. Figure 1 in the main text presents the spectrogramsof all 15 stimuli.Calculating mutual information from experimental data: A primerThe mutual information (MI) between two random variables, such as stimuli S and neural re-sponses R is defined in terms of their joint distribution p(S, R). When this distribution is knownexactly, the MI can be calculated asI(S; R) ≡ I [p(S, R)] ≡Xs,rp(s, r) logp(s, r)p(s)p(r)(1)where p(s) =Prp(s, r) and p(r) =Psp(s, r) are the marginal distributions over the stimuli andresponses respectively. Usually, neural responses are high dimensional and complex, and onlysome simplified version of the responses f (R) can be considered. Examples for such simplificationsare the total number of spikes in some window (spike count), the latency of the first spike afterstimulus onset, or a coarse resolution binary pattern representation of the spiking activity.Estimating MI from empirical data commonly involves two steps: first, estimating the jointdistribution of stimuli and simplified responses, and then calculating the MI based on this es-timated distribution. The first step in such calculations requires estimating the distribution ofneural responses for each stimulus. For example, when interested in information in spike counts,one calculates the distribution of number of spikes in the responses, as measured across repeatedpresentation of each one of the stimuli separately. Repeating this calculation for each stimulusyields the joint distribution of stimuli and responses. An example of this procedure (using whatis known as the maximum likelihood estimator) is given in Fig. 3 of the paper. Figure 3b showsraster plots of the responses to five different stimuli, and the number of spikes in each of the 20presentations of the first stimulus is given in Table 1a below. The corresponding distribution of1a.trial no 1 2 3 4 5 6 7 8 9 10# spikes 6 6 6 6 5 6 5 7 3 6trial no 11 12 13 14 15 16 17 18 19 20# spikes 5 4 4 5 6 7 6 6 9 5b.# spikes 1 2 3 4 5 6 7 8 9 10probability 0 0 0.05 0.10 0.25 0.45 0.10 0 0.05 0Table 1: Maximum likelihood estimation of spike count distribution. (a) The number of spikesin each of 20 stimulus presentations. (b) The resulting estimated distribution.spike counts for the first stimulus is given in Table 1b below, and the distribution of spike countsfor five representative stimuli is depicted in Fig. 3c. Figure 3d assembles all of these distributionstogether, forming the empirical joint distribution of stimuli and spike counts. Other statistics ofspike patterns can be used instead of spike counts. For example, spike trains can be viewed asbinary “words” of some fixed length, and their distribution can be estimated similarly to spikecounts distribution by counting number of appearances of each word across stimulus repeatedpresentations (Fig 3e).The second step is to calculate MI from the joint distribution. When the number of samples isvery large relative to the number of bins in the joint distribution matrix, the observed empiricaljoint distributionprovides a good estimate of the true underlying distribution, and the MI canbe calculated by plugging in the empirical distribution ˆp into the MI formula ,I [ˆp(S, R)] ≡Xs,rˆp(s, r) logˆp(s, r)ˆp(s)ˆp(r)(2)where ˆp(s) =Prˆp(s, r) and ˆp(r) =Psˆp(s, r) are the marginal empirical distributions overthe stimuli and responses respectively. Unfortunately, with common experimental settings thenumber of samples is often not sufficient, and this naive MI estimator is positively biased: ittends to produce overestimates of the MI relative to the MI of the true distribution,I [ˆp(S, R)] > I [p(S, R)] . (3)In addition, the variability of the estimator due to finite sampling is also considerable. It hasbeen shown that a first order approximation of the bias is#bins2N log(2)(4)where #bins is the number of degrees of freedom and N is the number of samples (Panzeri &Treves, 1996; Treves & Panzeri, 1995). Subtracting this estimate of the bias from the empiricalMI estimate reduces substantially the bias.Since the bias is roughly proportional to the number of bins in the joint distribution matrix,we have performed a procedure that iteratively unites rows or columns of the matrix. At eachstep, the row or column with minimum marginal probability was united with its neighbour withthe lower marginal probability. The MI was determined as the largest bias-corrected estimateamong all tested reduced matrices. This matrix reduction reduces the information in the matrix,2but at the same time reduces the bias, and therefore makes it possible to obtain higher andmore reliable estimates of the MI. The performance of this algorithm was discussed in detail in(Nelken, Chechik, King, & Schnupp, 2005).Interpretations of mutual informationThe MI value can be interpreted in a number of equivalent ways:(i) The MI is the reduction in uncertainty about the stimulus after a responseis observed. This is the standard information-theoretic interpretation. In the context studiedhere, without observing the response, the stimulus can be any one of the 15 stimuli, resulting inan uncertainty that is quantified by the entropy of the set of stimuli, here equal to log2(15) = 3.9bits. If the mutual information between a neuron and the stimuli is e.g. 0.5 bit, observing theresponses of the neuron reduces this entropy to 3.4. bits. Thus, the a-posteriori distributionover possible stimuli is less variable than the initial distribution. Observing more non-redundantneurons would reduce this uncertainty even more. If the uncertainty about the stimulus is 0, thestimulus is known with precision. Thus, theoretically, the responses of 8 totally non-redundantneurons, each with 0.5 bit/stimulus, are sufficient to fully specify the stimulus. I practice, neuronsmay be redundant and the actual number of neurons required to uniquely identify the stimulusmay be substantially higher.(ii) The MI is the log2of the

View Full Document