# Advice: on choosing among different indices of the "same" thing

Lots of terms in ecology are only loosely defined, or can have somewhat different meanings depending on the context. Which can make it difficult to measure those things, because different measures often will behave at least slightly differently. “Diversity” is a good example–there are lots of different “diversity” indices. So how do you choose the “right” index, or the “best” index, of whatever it is you want to measure? And what do you do if your results differ depending on what index you choose?

This issue is one that I think many ecologists worry about a lot more than they should. It’s really very simple:

• If you’re testing a precisely-defined model or hypothesis (which basically means a mathematical model, or a hypothesis derived from a mathematical model) that predicts the behavior of a particular index, then that’s the index you need to measure if you want to test that model.  For instance, if you want to test Bob May’s classic complexity-stability model, then your measure or index of stability needs to be the one May used, or one that you can show is tightly correlated with the one May used. And if that index of stability is difficult or impossible to measure (which in the case of May’s model, it is), then you either have to find some other way to test the model that doesn’t involve measuring stability, or you have to go ask some other question entirely.
• If you’re testing an imprecisely-defined model, like a verbal or “conceptual” model, that doesn’t specify a choice of index, then the choice of index is completely arbitrary, so just pick one and don’t sweat it. Worrying (or arguing with colleagues) about which index is “best” in such contexts is totally pointless. There’s no way to choose the “best” index of something that’s imprecisely defined. You can’t choose the “best” measure of something unless you know, on independent grounds, exactly what that something is. Yes, this means your results may well depend on your choice of index. If that bothers you (and in many situations, it should), you should pick or develop a more precisely-defined model or hypothesis to test.
• The only reason to calculate various indices of the “same” thing and then compare your results across those indices is if different indices give you complementary ecological information. For instance, if your hypothesis predicts that experimental treatment X will increase species richness but reduce Simpson’s diversity, then measuring both those “diversity” indices (species richness and Simpson’s diversity) helps you test your hypothesis. But it is not interesting or useful to calculate various indices simply to see if your results vary across different indices. Different indices are different. Of course they can behave differently. If they couldn’t, they wouldn’t be different indices.
This entry was posted in Advice by Jeremy Fox. Bookmark the permalink.

I'm an ecologist at the University of Calgary. I study population and community dynamics, using mathematical models and experiments.

## 14 thoughts on “Advice: on choosing among different indices of the "same" thing”

1. Of course I guess one could simply want to investigate the mathematical properties of various indices under simulation to gain some insight into what some set(s) of observed values thereof might represent. Not sure where that falls in your framework, if anywhere.

• Yes, one can investigate the properties and behavior of various indices, if for instance formulas for the indices can’t be written down (as in the case of many measures of “stability”), or are too complicated to easily interpret. That seems to me to be a theoretical task. Basically, you know that different indices will behave differently, and you want to quantify those differences. The post is empirically focused.

2. I’m sorry, but I have to disagree with you on “Worrying (or arguing with colleagues) about which index is “best” in such contexts is totally pointless.”.

Russ Lande proved that in a smart paper in Oikos (1996 I think) arguing rightly that, if the only thing you want to measure is species diversity but you have no underlying theory (and thus no obvious choice of index), you should stick with indices that are well-behaved (here, convex) and for which unbiased estimators do exist.

• Always good to have pushback!

Yes, I know the Lande 1996 paper, it is a good paper, but I don’t think it settles the larger issue. The issue is whether, in the absence of any underlying theory or agreed definition of whatever it is you’re trying to measure, one can just fall back on “desirable properties” of indices. I’m not so sure that thinking in terms of “desirable properties” is all that useful, except in that an index that lacked some key property or properties might not be considered an index of what we’re trying to measure at all. For instance (to pick a deliberately-silly example), a “diversity” index that always decreased as species richness increased could hardly be considered a “diversity” index.

I note that there often are arguments in the literature about what properties are desirable in an index. For instance, Lande 1996 is an additive partition of gamma diversity into alpha and beta diversity, but there are many who would argue that a multiplicative partition of gamma diversity is more desirable. I don’t have an opinion on this particular debate myself. I just note it because it suggests that, in the absence of an a priori definition of what you’re trying to measure, arguments about the “desirable properties” of indices are ultimately fruitless.

For me, the bottom line is that, if you don’t know exactly what it is you’re trying to measure, that vagueness is inevitably going to “contaminate” your interpretation of your data, no matter what index you pick and no matter what “desirable” properties it has. To me, that vagueness is by far a more important obstacle to doing good science than the statistical properties of different indices.

• Great topic. I think both of you have valid points (see, I can waffle with the best of ’em! Can we get a group hug?)

I think maybe the critical point with regards to find a robust metric (of anything) comes most into play in the context of some type of model. Then, you do want to pick the specific metric that is most robust (i.e. insensitive) to certain mathematical formulations inherent in the model–let’s call them “model aritfacts”–but at the same time most sensitive to the actual real-world drivers of the chosen index. This may mean the index doesn’t have all the desirable properties that you would like in a perfect world (including perfectly measuring just what it is you’re trying to measure), but it also doesn’t go wildly off the rails under certain model situations. I wish I had a ready example to illustrate this, but alas, I am brain dead as usual.

Pedantic point: a negatively correlated index is still useful, but I’m sure you know that and maybe it wasn’t your point anyway.

• Thanks Jim. I agree that Francois’ point is a good one, even if I don’t completely agree with him on its implications or lack thereof.

I’m not quite clear what you mean by a “robust” metric. If you can think of an example that would help. I guess I’m a little unclear because, when we have a mathematical model that we’re trying to test, that generally implies that we have a precise definition of whatever it is we want to measure. In my experience, problematic uses of indices by ecologists tend to crop up in two contexts:

One is when someone knows exactly what he wants to measure, but can’t measure it for whatever reason, and so the makes the mistake of choosing some index more or less arbitrarily, without actually checking if it is, or is at least likely to be, correlated with what he really wanted to measure. “Stability” is a good example of this. There are many, many “tests” of theoretical predictions about “stability” that aren’t actually tests of the theory at all, because the authors mistakenly assumed that any ol’ index of “stability” would do. Wrong–just because some index measures something that could be called “stability” in some sense doesn’t mean it’s actually an index of “stability” in the specific sense defined by the theory that you’re trying to test. I find it strange that people do this. Nobody would use, say, an index of caribou abundance (track density, or scat density, or whatever) unless they had good reason to think it was actually correlated with caribou abundance. I guess people just aren’t as aware as they should be that different indices of anything typically behave differently.

The other is when you don’t know exactly what you want to measure, because what you want to measure isn’t precisely defined. This leads people to worry about the choice of index, or else to use multiple indices and then act surprised and treat it as an interesting empirical result when those indices give different answers. I’d argue that there’s no reason to worry, and no reason to be surprised.

Re: pedantic point: yes, I know, and no, that wasn’t my point. I thought about illustrating my point with a more complicated example (say, a diversity index that necessarily declines with increasing species richness over some range of species richness values, but not for all species richness values), but decided just to keep things simple.

• I agree with everything you said there. I think there’s a third category though. I have an example but let me try to come up with a better one, and also make sure that it’s truly a third category.

• Hmmm, let’s see. I think I’ll approach this more as a thinking out loud exercise.

Suppose we want to measure the change in the “central tendency” of some population of something or other, under some observed forcing/driver. We know that the mean and the median both qualify as a suitable measure thereof, and that we can choose either as our “index” of the population’s response. We suspect that there might be some wild outlier values amongst the individual responses, based on past experience, and/or because we are relying on data collected by who knows who with uncertain measurement accuracy and an even more uncertain sampling scheme. But we’re not really sure whether these represent ‘real’ responses or not. So we decide, knowing how the mean and median are defined exactly, that the median is probably the better index to use. We might even choose something even more outrageous, like the mode, given the heretical and troublemaking genes that we most likely carry.

Which of these three possible indices should we choose, and how should we make the decision? Does this fall into your first category, where you just simply haven’t defined well enough what you mean exactly by “measure of central tendency”, and if you did so, the ‘problem’ would resolve itself instantly?

• Ok Jim, I see what you mean. Yes, this seems like a good example of the sort of thing Francois was getting at. It’s true that, if “measurement of central tendency” were more precisely defined the problem would resolve itself. But I agree that, even in the absence of a precise definition of “central tendency”, that the choice of the median makes sense in this case, for the reasons you suggest. So yes, I’ll concede that there are cases when those sorts of basically statistical considerations can guide one’s choice of index.

3. Pingback: Cool new Oikos papers « Oikos Blog

This site uses Akismet to reduce spam. Learn how your comment data is processed.