Much of science boils down to putting numbers on things, and then figuring out why we got the numbers we did, as opposed to other numbers we might have gotten. Usually, when we think about the numbers we got, and might have gotten, we think about the biological, measurement, and sampling processes that together generated the data. For instance, a population ecologist tracking changes over time in the abundances of different species at a site would think about the birth, death, and dispersal rates of those species, and about sources of sampling error and bias in her sampling methods.
Sometimes we also think about physical and measurement constraints that make certain data impossible. For instance, negative abundances are physically impossible, so if I discover a negative abundance in my dataset, I know it’s a typo. As an example of a measurement constraint, standard techniques for measuring dissolved phosphate concentration in lake water can’t detect concentrations below a threshold level.
This post is the first in a series about the importance of an often-overlooked class of reasons why you got the numbers you got, as opposed to other numbers: mathematical constraints.
Physical systems, and our measurements of them, obey mathematical rules as well as physical rules. And the interplay of mathematical rules with physical and measurement processes can be tricky. Even when you recognize the existence of some mathematical constraint that your data have to obey, it can be difficult to decide what if anything to do about it. On the one hand, mathematical constraints often complicate the interpretation of our data. Which is annoying, because mathematical constraints often seem trivial or scientifically uninteresting. So our first instinct is often to try to get rid of them, so that they stop distorting or obscuring whatever “interesting” signal there is in our data. But in many cases, you can’t just get rid of a mathematical constraint in your data while keeping everything else the same. Any more than you can just chuck one of Euclid’s axioms while keeping Euclidean geometry otherwise unchanged.
So starting today, I’m going to do a series of posts on cases in which mathematical constraints affect ecological data and its interpretation, or have been claimed to do so. As you’ll see, one of the striking things about mathematical constraints is how often smart people disagree on what to do about them (or even if they exist!) But those disagreements have proceeded independently of one another. My hope is that a comparative study of mathematical constraints in ecology will reveal some common threads across different disagreements. From which some broad insights will emerge. I hope!
Our first mathematical constraint: every species can’t negatively covary with every other species.
I have an old post on this, but I’ll summarize here. Briefly, imagine that you have data on two or more variables—say, the abundances of two or more species at various sites or times (the variables don’t have to be species abundances). If species 1 is rare at times or places where species 2 is abundant, and vice-versa, those two species covary negatively. As an ecologist, you might be interested in the extent to which species covary negatively with one another, because that might be a sign that they’re competitors (but see), and because negative covariation contributes to stability of total biomass. And you might be interested in how negative covariation among species changes with the number of species, for instance because higher biodiversity might be “stabilizing” for some reason.
One obvious measure of covariation is the correlation coefficient. As is well known, there are mathematical constraints on the values of correlation coefficients; they can’t be >1 or <-1. What’s perhaps less well-known is that there are narrower mathematical constraints on the possible values of a matrix of correlation coefficients.
For instance, if you have N species, you can summarize all their pairwise correlations across sites or times in the form of a NxN matrix of correlation coefficients, with the coefficient in row i and column j giving the correlation between species i and j. This matrix necessarily is symmetrical around the diagonal, because the correlation of species i and j necessarily is the same as the correlation of species j and i. This matrix also necessarily has 1’s on the diagonal, because the correlation of any species with itself is 1. And there’s another, less obvious but more important constraint: it’s mathematically impossible for the average value of the off-diagonal correlation coefficients to be too negative. Every species can have a correlation of +1 with every other species, but it’s mathematically impossible for every species to have a correlation of -1 with every other species unless there are only two species. Indeed, in the limit as the number of species goes to infinity, the minimum possible average correlation goes to 0. This is true no matter how strongly or weakly species compete, or how similarly or differently they respond to environmental fluctuations, or etc. (UPDATE: It’s still true even if there are priority effects, or each site can only have one species, or all species are competitively equivalent, or whatever other weird ecology you can dream up. If your first reaction to what I just wrote is to try to think of some sort of weird ecology that would prove me wrong, try this exercise: try to write down three columns of numbers that all have correlations of -1 with each other. Let me know when you succeed. I’ll wait. 🙂 )
That the possible range of the average pairwise correlation between species depends on the number of species really throws a spanner in the works if you’re trying to relate the average pairwise correlation to possible ecological drivers like the number of species and the strength of competition among them. See the comment thread on that old post for discussion of what, if anything to do about this mathematical constraint. It turns out that is not easy (I’d say impossible) to somehow transform a correlation matrix so as to get rid of this constraint while leaving everything else about the matrix unchanged.
This seems to me like a case in which we should just learn to live with mathematical constraints, such as by choosing a different measure of covariation that has the same range of variation, independent of the number of variables you’re considering. For instance, in Vasseur et al. 2014 we chose a measure of covariation known as the wavelet modulus ratio. It focuses not on covariation among pairs of species, but on the extent to which fluctuations in the abundances of different species (at a given frequency) cancel out at the level of total abundance. This measure of covariation has its own interpretive challenges, and won’t be appropriate for every question. But it has the major advantage that its mathematically-possible range is from 0-1, no matter how many species you have. That’s still a mathematical constraint, but its one that to my mind aids rather than hinders interpretation. Others have also suggested measures of covariation that range from 0 to 1, independent of the number of species (e.g., Loreau and deMazancourt 2013).
p.s. Addressing the mathematical constraint here still doesn’t let you easily infer anything about species interactions from interspecific covariation, in particular whether species compete or how strongly. As Loreau and deMazancourt (2013) note, even in a very simple Lotka-Volterra-type model, “[P]opulation synchrony can either increase or decrease as interspecific competition gets stronger…our analysis does not support the intuitive hypothesis that interspecific competition stabilises aggregate ecosystem properties through compensatory dynamics between species”.