Much of science boils down to putting numbers on things, and then figuring out why we got the numbers we did, as opposed to other numbers we might have gotten. Usually, when we think about the numbers we got, and might have gotten, we think about the biological, measurement, and sampling processes that together generated the data. For instance, a population ecologist tracking changes over time in the abundances of different species at a site would think about the birth, death, and dispersal rates of those species, and about sources of sampling error and bias in her sampling methods.

Sometimes we also think about physical and measurement *constraints* that make certain data impossible. For instance, negative abundances are physically impossible, so if I discover a negative abundance in my dataset, I know it’s a typo. As an example of a measurement constraint, standard techniques for measuring dissolved phosphate concentration in lake water can’t detect concentrations below a threshold level.

This post is the first in a series about the importance of an often-overlooked class of reasons why you got the numbers you got, as opposed to other numbers: *mathematical constraints*.

Physical systems, and our measurements of them, obey mathematical rules as well as physical rules. And the interplay of mathematical rules with physical and measurement processes can be tricky. Even when you recognize the existence of some mathematical constraint that your data have to obey, it can be difficult to decide what if anything to do about it. On the one hand, mathematical constraints often complicate the interpretation of our data. Which is annoying, because mathematical constraints often seem trivial or scientifically uninteresting. So our first instinct is often to try to get rid of them, so that they stop distorting or obscuring whatever “interesting” signal there is in our data. But in many cases, you can’t just get rid of a mathematical constraint in your data while keeping everything else the same. Any more than you can just chuck one of Euclid’s axioms while keeping Euclidean geometry otherwise unchanged.

So starting today, I’m going to do a series of posts on cases in which mathematical constraints affect ecological data and its interpretation, or have been claimed to do so. As you’ll see, one of the striking things about mathematical constraints is how often smart people disagree on what to do about them (or even if they exist!) But those disagreements have proceeded independently of one another. My hope is that a comparative study of mathematical constraints in ecology will reveal some common threads across different disagreements. From which some broad insights will emerge. I hope!

Our first mathematical constraint: **every species can’t negatively covary with every other species.**

I have an old post on this, but I’ll summarize here. Briefly, imagine that you have data on two or more variables—say, the abundances of two or more species at various sites or times (the variables don’t have to be species abundances). If species 1 is rare at times or places where species 2 is abundant, and vice-versa, those two species covary negatively. As an ecologist, you might be interested in the extent to which species covary negatively with one another, because that might be a sign that they’re competitors (but see), and because negative covariation contributes to stability of total biomass. And you might be interested in how negative covariation among species changes with the number of species, for instance because higher biodiversity might be “stabilizing” for some reason.

One obvious measure of covariation is the correlation coefficient. As is well known, there are mathematical constraints on the values of correlation coefficients; they can’t be >1 or <-1. What’s perhaps less well-known is that there are narrower mathematical constraints on the possible values of a matrix of correlation coefficients.

For instance, if you have N species, you can summarize all their pairwise correlations across sites or times in the form of a NxN matrix of correlation coefficients, with the coefficient in row i and column j giving the correlation between species i and j. This matrix necessarily is symmetrical around the diagonal, because the correlation of species i and j necessarily is the same as the correlation of species j and i. This matrix also necessarily has 1’s on the diagonal, because the correlation of any species with itself is 1. And there’s another, less obvious but more important constraint: it’s mathematically impossible for the average value of the off-diagonal correlation coefficients to be too negative. Every species can have a correlation of +1 with every other species, but it’s mathematically impossible for every species to have a correlation of -1 with every other species unless there are only two species. Indeed, in the limit as the number of species goes to infinity, the minimum possible average correlation goes to 0. This is true no matter how strongly or weakly species compete, or how similarly or differently they respond to environmental fluctuations, or etc. (UPDATE: It’s still true even if there are priority effects, or each site can only have one species, or all species are competitively equivalent, or whatever other weird ecology you can dream up. If your first reaction to what I just wrote is to try to think of some sort of weird ecology that would prove me wrong, try this exercise: try to write down three columns of numbers that all have correlations of -1 with each other. Let me know when you succeed. I’ll wait.🙂 )

That the possible range of the average pairwise correlation between species depends on the number of species really throws a spanner in the works if you’re trying to relate the average pairwise correlation to possible ecological drivers like the number of species and the strength of competition among them. See the comment thread on that old post for discussion of what, if anything to do about this mathematical constraint. It turns out that is not easy (I’d say impossible) to somehow transform a correlation matrix so as to get rid of this constraint while leaving everything else about the matrix unchanged.

This seems to me like a case in which we should just learn to live with mathematical constraints, such as by choosing a different measure of covariation that has the same range of variation, independent of the number of variables you’re considering. For instance, in Vasseur et al. 2014 we chose a measure of covariation known as the wavelet modulus ratio. It focuses not on covariation among pairs of species, but on the extent to which fluctuations in the abundances of different species (at a given frequency) cancel out at the level of total abundance. This measure of covariation has its own interpretive challenges, and won’t be appropriate for every question. But it has the major advantage that its mathematically-possible range is from 0-1, no matter how many species you have. That’s still a mathematical constraint, but its one that to my mind aids rather than hinders interpretation. Others have also suggested measures of covariation that range from 0 to 1, independent of the number of species (e.g., Loreau and deMazancourt 2013).

p.s. Addressing the mathematical constraint here still doesn’t let you easily infer anything about species interactions from interspecific covariation, in particular whether species compete or how strongly. As Loreau and deMazancourt (2013) note, even in a very simple Lotka-Volterra-type model, “[P]opulation synchrony can either increase or decrease as interspecific competition gets stronger…our analysis does not support the intuitive hypothesis that interspecific competition stabilises aggregate ecosystem properties through compensatory dynamics between species”.

Hi Jeremy,

You are fine with the case where all the values are 1, in which case all species can co-exist locally. But what if there is zero local species coexistence? In a 3 species matrix with all (-1) values on the non-diagonal, Everything can completely out compete everything else. This could be a case where priority effects completely dominate.

Happy to be totally wrong about this! Maybe I missed something? What would the non-diagonal component of a 3-species matrix look like if the species were completely equivalent, and all that mattered was priority effects at the local scale?

You can’t have a correlation matrix with more than two species and all -1s on the off-diagonals. Doesn’t matter if there are priority effects or equivalent species or whatever. This is math, not biology. You can’t write down three columns of numbers for which all the pairwise correlations are -1.

For instance, take a case of 3 species, only two of which can coexist at any site. Say that species 1 and 2 have a correlation of -1, because at every site at which species 1 is present, species 2 is absent, and vice-versa. Now, what pattern of presences and absences can species 3 have at those same sites so that it has correlations of -1 with *both* species 1 and species 2? The answer is “none”. For instance, if you want species 3 to have a correlation of -1 with species 1, it needs to be present wherever species 1 is absent, and absent wherever species 1 is present. That is, it needs to have a correlation of *+1* with species 2.

Got it. My example was slightly more nuanced. None of the 3 species can co-exist (if any one of the species is present, neither of the other two species can be). There are 3 local communities, each with one species present. I believe you’re saying the community where species 3 is present, and BOTH species 1 and 2 are absent, decreases the negative correlation among species 1 and 2, as your pairwise abundance matrix of species 1 and 2 now has the coordinates (0,1), (1,0), and (0,0), generating a correlation of -0.5 among the abundances of species 1 and species 2. In this example the off-diagonal correlations are all -0.5. This makes sense, though it is counter-intuitive. Thanks for making me think about this harder!

Even if there’s only one species per local community, all the off-diagonal correlations cannot be -1. Try it. See if you can write down pres-abs data for 3 spp with all the correlations -1.

In your 3 community example, none of the correlations are -1. For every pair of species, there’s one site where they are both absent. So their correlation is greater than -1. (EDIT: I just checked, it’s -0.5).

Perhaps this is just semantics but I would argue that this mathematical constraint is actually (or also) a biological/physical constraint and should help us think about the system in a clearer way. I think this is in general true about many (any?) mathematical constraint – it’s not just an artifact of the numbers it IS reality (i.e., physical and biological reality). And in this way, these constraints should help us formulate better theories about why things in the world exist as they do. If we think about it as “just” a mathematical artifact that needs to be circumvented somehow then we miss out on the opportunity to gain some insight and clarity about the process.

I agree! Future examples in the series will include some in which recognizing a mathematical constraint leads to real biological insight. As you say, biological and physical systems have to play by the mathematical rules, so there’s a sense in which the mathematical rules *are* biological/physical rules.

Ahh, the beauty of mathematics! As your exchange with colinaverill above illustrates, even this relatively simple example can lead to clarity of thought about the biology.

Nice post. No need for complicated wavelet modulus time series analyses, though.

Just invert the covariance matrix, rescale it to give correlations, and multiply by -1. This gives you a partial correlation matrix, which describes the effect of species i on species j (or vice-versa) *after controlling for the other species in the community*. All the values partial correlations will be negative when species reduce one another’s fitness.

Ecologists would never use raw correlation matrices to describe most other phenomena that we’re interested in; we always control for other factors. For some reason, we seem to have forgotten that we can do that for species interactions.

My paper on this was recently promoted to the “accepted articles” section of Ecology’s website, and goes into more detail as well as presenting some more rigorous ways to accomplish the same thing. http://onlinelibrary.wiley.com/doi/10.1002/ecy.1605/abstract

Sure. But whether you want to be looking at partial correlations depends on your question. For some questions, the raw correlations are indeed what you want. In particular, if what you’re interested in is how temporal covariances among species contribute to stabilizing or destabilizing their total abundance or biomass, the raw correlations (or covariances) are what you want, not the partial correlations. You don’t care about the effect of one species on another, controlling for everything else. You just care about whether they all go up and down in synchrony or not, no matter who affects whom. The variance of a sum of variables is defined as the sum of their variances plus twice the sum of their covariances.

I look forward to reading these posts, you are off to an excellent start! I hope you address the positive sum constraint induced when one measures relative abundances/ species compositions. I feel like this is often ignored and can result in misleading results, particularly in multivariate methods.

Thanks! Re: the “positive sum constraint”, I haven’t heard it called that. But yes, there’s a post in the queue on the interesting consequences of the fact that relative abundances have to sum to 1, and the closely-related fact that average relative fitness of all competing types necessarily has to equal 1. I use an evolutionary example of the consequences of these mathematical constraints, just because it’s my favorite example. But hopefully readers will chime in with ecological examples illustrating the consequences of these constraints.

Pingback: Mathematical constraints in ecology and evolution, part 2: local species richness can’t exceed regional richness | Dynamic Ecology

Pingback: Mathematical constraints in ecology and evolution, part 3: why selection is risk-averse | Dynamic Ecology