Ralf's advice on computing KLs

I assume you have spike times for all your channels. Let's call P and Q the two distributions that are to be compared.

    1. Choose a time bin size (e.g. 10ms) such that there aren't too bins with more than 1 spike per channel in it and that there aren't too many bins with only zeros in them. The state which represents no spikes in all channels is likely to be the most common one, but you don't want it to be orders of magnitude more common than the other states.

    2. Convert bins into states. Plot your state distributions separately for the two distributions P and Q you'd like to compare. This will:

      • tell you whether your choice for bin size was a good one.

      • give you a feel for the shape of the respective distribution. (Typically it should look like a power-law where most probability mass is in the all-zero bin and the least is in the all-ones bin.)

      • tell you what roughly to expect from the entropy measure for P and Q. The maximum for a binary distribution with n states is 2^n. Since your distributions are likely very skewed, the entropy will be much smaller.

      • tell you what roughly to expect from the KL difference measure between P and Q, i.e. whether it should be small (if the histograms for P and Q are very similar) or large (if they are very different). "Small" and "large" here is in comparison to the entropy.

    3. Compute KL and entropy for your distributions. Compare to

General guides:

    • The more channels you include in your analysis, the more likely one is to see differences between the two conditions. However, this comes at the cost of higher uncertainties around your KL estimate. In the extreme of including only 1 channel for P and Q, you're effectively comparing spike rates.

    • About 1 data point per state (on average) should be the absolute minimum amount of data you'll want - the more the better.


    • Start with few channels to get a feel for your data before increasing the number of channels.

    • Check for time-inhomogeneities in your data. The computation assumes that P and Q don't change over time. In fact it will shuffle your bins for the computation.