Ecologists, especially community ecologists, are always looking for shortcuts. By which I mean, they’re always looking for ways to make strong inferences about mechanisms and processes, just based on clever analysis of easily-collected or already-existing observational data. In the past, I’ve criticized various such shortcuts: randomization of species x site matrices to infer interspecific competition, plotting coexisting species onto a phylogeny to infer contemporary coexistence mechanisms, plotting local vs. regional species richness to infer whether local communities are closed to invasion, and using the shape of the species-abundance distribution to infer whether the world is neutral. And there are many others I’ve never mentioned, some of which have been refuted by others, such as the use of ordination methods to infer the process dominating metacommunity dynamics (refuted by Gilbert et al. 2010), the use of power law distributions of movement lengths to infer whether foraging animals follow Levy walks (e.g., Augur-Methe et al. 2011) and using body size ratios of co-occurring species to test for stable coexistence via resource partitioning (the latter is an old, “classic” example of a shortcut).
It’s a pain to shoot down dead-end “shortcuts” one at a time. So in the interests of efficiency, I’m going to pose a question: has any shortcut in ecology ever worked as advertised? Can anyone name one?
I emphasize that I’m not talking about methods that are merely suggestive, or one potentially-useful tool among many, or that are useful in conjunction with other lines of evidence, or that are powerful but mostly impractical (e.g., attractor reconstruction, which if memory serves requires time series thousands of points long), or that don’t comprise a relatively simple “recipe” that pretty much anyone can apply in any system (e.g., there’s no set recipe for how to “develop and parameterize a dynamical model of your study system”). I’m talking about methods that, caveats aside, were originally sold and used something like how phylogenetic community ecology was originally sold and used: as a simple, broadly-applicable, straightforward and yet powerful way to infer process from pattern. Methods that were sold and used as a way to cut through the complexity of community ecology and provide a clear, short path to major insights. That’s more or less how every one of the shortcuts listed above was sold and used originally, and none of them panned out. Hence my question: has any shortcut in ecology ever panned out?
Another way to pose the question is to ask: what would have to be the case for a method based solely on observational data to allow more-or-less reliable inferences about underlying causal processes? Is it likely that any such method exists in ecology? If it did, is it possible that it would be simple, broadly-applicable, straightforward, and yet powerful?
Based on the example of successful observation-based sciences like astronomy, which I’ve discussed previously, I’ll tentatively suggest that no such methods exist in ecology. If you don’t have a quantitative, well-validated theory that incorporates all the processes affecting the observations of interest, then you basically have no hope of solving the “inverse problem” of inferring the underlying processes from the observations.
Note that I do think there are cases outside the physical sciences where valid shortcuts exist. I’m thinking of methods like the HKA test and its derivatives, used to test for selection in population and evolutionary genetics just based on gene sequence data (here is a brief reivew). Notably, much as with the case of astronomy, this is a case where we can write down a more or less complete list of the underlying processes that affect the observational data of interest, where we have a rigorous, quantitative theory of how those processes will affect the observed data, and (crucially) where different processes (here, selection vs. drift) are predicted to affect the data in quite different ways, leaving quite different “signatures”. In contrast, a common failing of putative shortcuts in ecology is that the process for which the shortcut purports to test often is only one of many that could give rise to data that look a certain way.
Can anyone name any shortcuts in ecology that actually turned out to be shortcuts, in the sense defined above?
Not off-hand no.
But I feel the need to provide a quick defense of my field of observational community ecology, even though I recognize that you are not making a direct attack on it. Observational data in community ecology *have* been useful for (1) ruling out causal hypotheses (though as you say not for ruling out all incorrect hypotheses, and I completely agree that ‘accepting an unjustified alternative on the grounds of a rejected null’ is far too common), (2) reducing predictive uncertainty (i.e. you don’t need to have a perfect causal understanding to make good predictions), and (3) generating natural history knowledge that is essential for designing good experiments. And I’m probably forgetting other important uses of observational community ecology.
Yes, absolutely, there are plenty of things that observational data are good for. Your #1 is particularly important. For instance (to pick just one example of many), I think my fellow blogger Brian once raised the example of how paleo data showing that the latitudinal species richness gradient is at least 100 million years old rule out the possibility that that gradient is just a post-ice age transient.
But as you noted, the post isn’t about legitimate uses of observational data. Everybody grants that observational data are useful in various ways, and everyone is right, so it would be boring to post about that. 😉
Right…I just wanted to try to temper the tone of the post a bit.
One more comment. Observational statistical methodologists often (not always) find themselves in a bit of a bind. On one hand, no one reads their stuff unless they make either big or cleverly ambiguous claims about what their methods can do. On the other hand, the theoretical issues with such claims will always bubble to the surface after a large enough group of researchers have used the methods. A better model would be to stop selling observational statistical methods for their ability to identify causal mechanisms, and promote research that clarifies the information that can be gleaned from observational data. Sorry…I’m about to get a bit cynical…but researchers want to tell and hear ‘experiment-style’ stories with observational data. This is where the ‘short-cut’ problems start I suspect. Observations just give us a different kind of story, which I wish was better appreciated.
A very tentative proposal might be that the simple theory behind community biomass fluctuations developed <10 years ago has been pretty successful at predicting the (positive) majority of diversity–biomass stability relationships, from microcosm to field/semi-natural systems. It's arguable that many of the simplifying model assumptions are not met in those systems.
I've recently been working on a couple of projects that show that negative D–S patterns are certainly possible in relatively simple communities (apologies for the self-promotion, but hey – that’s how this blog works, right?), which raises the question, why do the simpler models get things right even though we (at least, I) think they make the wrong assumptions?
Mike:
You might be interested in this recent discussion of how bad models can lead to good inferences,
http://andrewgelman.com/2012/10/another-reason-why-you-can-get-good-inferences-from-a-bad-model/
Re: bad models leading to good inferences, Tony Ives had a paper a little while back (sorry, too lazy to look it up–I think in Ecology, maybe in EcoLetts?) about how ARMA(p,q) models can be used to make robust inferences about things like the return rate of the time series to its stationary distribution. The key is that, even though the true values of p and q are difficult to choose accurately and precisely, any choice that leads to a reasonably well-fitting model is going to give you about the same estimate for the return rate.
Thanks Steven, hadn’t seen that one!
Can you explain a little more what theory you’re thinking of Mike, and what conclusions it gives us an easy, generally-applicable shortcut to inferring? Because my reading of that literature is that it’s yet another failed shortcut. That is, despite occasional claims to the contrary, you actually can’t use patterns of pairwise covariation among species’ biomasses over time to infer much of anything about the underlying processes driving those patterns. In particular, you can’t say “These species covary negatively–they must be competing!” But you know all this, so I’m guessing that’s not what you’re getting at…
Re: self-promotion, I know you’re just teasing (about which, no worries), but your joke touches on a serious point I may post on at some point. If this blog were self-promoting, no one would read it. Self-promoting blogs basically attract no audience. When I was at Oikos blog, my least popular posts were the ones plugging cool new Oikos papers, even though I went out of my way to “add value” to those posts rather than letting them just function as adverts for the journal. Don’t get me wrong, this blog has greatly raised my “profile” in ecology–but only because I’ve made no particular effort to use it to raise my profile.
Jeremy, linking your response to the Ives paper you mentioned (could it be http://www.esajournals.org/doi/abs/10.1890/08-0487.1?) might be quite a useful way for me to respond.
I don’t know the ARMA approach Abbott et al use quite so well as the more basic AR approach that Tony Ives has employed a lot before (I’ve only skimmed the Abbott et al paper, and that was a while ago), but I think they are both relatively simple (and smart) approaches for starting to partition variation in population time-series.
The earlier AR approach, leading to understanding of the statistical “mechanisms” of overyielding, portfolio/insurance and covariance effects was an extremely important breakthrough. The next step should have been to get into what biological mechanisms lead to these statistical patterns (hence the quotation marks above), which I guess might be what you mean when you talk about “failure”. I think that the mechanisms behind Overyielding are pretty well understood, but I agree that some prominent research still confuses the other statistical patterns with biological mechanisms (which has even been illustrated by some previous discussion on this blog!) and the field’s probably not moved forward as fast as it might. But I don’t think the insight gained from the AR approach, with all its simplifying assumptions, has been a failure. There are three statistical patterns that are involved with (steady state*) total community biomass stability. We can now get to work on understanding the specific biological features of our system that are most likely to generate those patterns.
One important feature that had been lacking with these approaches is comparison of alternative AR (and/or ARMA) models (e.g., in an AIC framework, see recent work by Jonas Knape). Perhaps the Abbott and Knape papers will start people thinking about this more carefully, by encouraging identification of the dimension of the focal system and comparing alternative models. Just don’t mention
the wardispersal yet…* and definitely don’t mention transients based on initial conditions while you’re at it.
Before Steve Walker points it out, just let me note for the record that this post could be viewed as Bayesian. I could be viewed as saying that, based on past experience, I assign a very low “prior probability” to the hypothesis “New shortcut method X works as advertised.”
I decline to say whether or not I endorse a Bayesian reading of the post. Perhaps I’m just trying to mess with Steve’s new statistical philosophy classification system. 😉
Space for time substitution in community dynamics perhaps?
Not that it’s perfect of course, there are many potential potholes in it’s use. But It can at least give an approximation of temporal dynamics that would otherwise not be had or would take a very long time indeed for some communities (forests in particular).
That’s probably as good a guess as any. Although my colleague Ed Johnson just wrote a review in EcoLetts a few years ago arguing that it works terribly (I don’t know nearly enough to judge).
So here’s a question: what’s the appropriate thing to do when one comes across something that would likely fall into the Fox Shortcut category? This new paper seems compelling, but I think it would be classified by you as a shortcut (correct me if I’m wrong); should I put it aside and not think about it anymore?
http://www.sciencemag.org/content/early/2012/09/19/science.1227079.short
“what’s the appropriate thing to do when one comes across something that would likely fall into the Fox Shortcut category?”
Evaluate it on its merits. As I said, in other fields, including life science fields, there certainly are powerful, observation-based methods for making inferences about process. And maybe there are some in ecology, I just don’t know about them or forgot them or they haven’t been invented yet.
The paper you linked looks intriguing. It’s by some really good people (which won’t affect my evaluation of the paper, which I have yet to read, but does make me particularly inclined to read it). Sadly, my library doesn’t subscribe to Science Express, so I’ll have to wait until the paper is published in the journal to read it. Just judging from the abstract, my first question would be how data-hungry the approach is. Is it something like attractor reconstruction or embedding dimension estimation, which really does work–but only if you have much longer time series than ecologists typically have or can obtain in less than several lifetimes? My second question would be, it’s great that it works for nonseparable, nonlinear dynamical systems not covered by Granger causality–but only “weakly connected” ones? How serious a limitation is that? Probably both these questions and more are addressed in the paper–but we’ll have to wait and see. At least I will!
Ah, okay; I think I missed the main gist of your post, which (now I think) is: Fox Shortcuts don’t (yet) exist in ecology — unless a reader can point one out that I didn’t think of — but they do exist in other fields of science.
My first question was the same as yours; the authors are mainly fisheries guys and they use some fisheries examples with data of about 50 time points (years), which means it may be tractable in systems where we have long-term data (or in microcosms where we can run an experiment for 50 days and take measurement each day).
Your second question is not well addressed in the paper — it is somewhat addressed. And I’m not sure if the 45-page supplement has more details, as I haven’t scoured that yet.
I just skimmed the paper (thanks for sending it!), it’s very cool. Various practical questions do remain, but yeah, this looks like a very important paper. I’ve already sent it to a collaborator to ask if we should think about trying the approach out.
A bit of googling reveals a related paper. As I suspected, the paper you linked to does seem to be related to attractor reconstruction and Takens’ Embedding Theorem.
Judging by a skim of the paper I just found, the core idea is clever and *potentially* very powerful, almost a sort of free lunch. Basically, if you have multiple time series for different, possibly causally-connected variables, then having those “extra” variables can actually make state space reconstruction *easier* rather than harder. Having information about more variables can “substitute” for having a really long time series for any one variable. Which, if it works in practice, would be awesome, because it’s often way easier and faster to collect a bunch of short time series of a bunch of variables than to collect one long time series of one variable. But I’m just digging into this, so don’t take anything I just wrote as gospel.
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0018295
Here is a compilation of relevant background reading, from Cozma Shalizi:
http://masi.cscs.lsa.umich.edu/~crshalizi///notebooks/state-space-reconstruction.html
Thanks for providing the link–this looks very interesting, and could well be something I want to blog about at some point.
aa
Thanks for the additional references!
Pingback: Links 10/27/12 | Mike the Mad Biologist
Pingback: Answers to reader questions: part I | Dynamic Ecology
Pingback: On the tone and content of this blog (feedback encouraged) | Dynamic Ecology
Pingback: Zombie ideas in ecology: the local-regional richness relationship | Dynamic Ecology
Pingback: We need more “short selling” of scientific ideas | Dynamic Ecology
Pingback: When, if ever, is it ok for a paper to gloss over or ignore criticisms of the authors’ approach? | Dynamic Ecology
Mesocosms are the ultimate short cut – If used to test explicit questions they can be extremely valuable, but still a shortcut to elucidating pattern generation outside of a beaker.
Interesting. Though I’d suggest that microcosms and mesocosm experiments can be useful for various reasons, and elucidating the mechanisms of pattern generation that operate in nature is only one of them. We have various old posts on this, including a very nice guest post from Britt Koskella that articulates the sort of use of microcosms & mesocosms that you’re thinking of:
https://dynamicecology.wordpress.com/2011/06/10/objections-to-microcosms-in-ecology-and-their-answers/
https://dynamicecology.wordpress.com/2013/06/03/microcosms-guest-post/
And I also have an old post arguing that model systems in general are a “shortcut” in the sense that they make it tractable for us to ask the questions we want to ask:
https://dynamicecology.wordpress.com/2012/10/18/ecologists-should-quit-making-things-hard-for-themselves-and-focus-more-on-model-systems/
Model systems are valuable in generating testable hypotheses that can be tested across systems – Our recent Ecology Letters paper is a good example of that. It examines the global robustness of the Stress Gradient Hypothesis that was developed in the rocky intertidal and finds that it is broadly applicable across systems. At least those based on foundation species.
This is exactly the sort of setting where theoretical computer science can help (I know, I know… I’m a one trick pony). The usual issue I see with mechanism from data inference techniques I see in biology, is that they usually assume a ridiculously simple or restrictive set of possible mechanisms parametrized in a very basic way, but then draw much broader verbal conclusions from it (I also see this a lot in ‘networks science’). The advantage of results based on computational complexity is that you only have to assume that the underlying theory is mechanistic (no non-deterministic magic, although probabilistic or stochastic is fine), and if you want more fine grained then some very generous resource constraints (say exponential time is ruled out, because there has not been enough time in universe or some such). Apart from that, no further restrictions are needed, and you can start to show very general results and use your data to rule out whole classes of mechanisms without further a priori assumptions on the underlying theory.
Pingback: What metacommunity ecology can learn from population genetics | Dynamic Ecology
Pingback: Ask us anything: Is ecology “idea free”? | Dynamic Ecology
Pingback: On progress in ecology | Dynamic Ecology
Pingback: Friday links: what’s flipped learning anyway?, bad null models, peer reviewers vs. lightbulbs, and more | Dynamic Ecology