So I have been arguing that in order for ecology to progress as a science, we need to stick our neck out and make risky predictions that might actually be wrong (here and here). That’s all fine and good, but the obvious question is how to make such risky predictions.
In particular, many comments on previous posts have raised the issue of whether the predictions are mechanistic or phenomenological. The mainstream view in ecology is very reductionist – to explain communities we have to make our explanations in terms of populations – to explain populations we have to make our explanations in terms of individual behavior and physiology – to explain behavior and physiology we have to look at endocrine systems, proteins, etc. With evolution mixed in there somehow. This is almost a holy doctrine in ecology. And extended to prediction, it says we have to make predictions that build up from the little pieces with a thorough understanding of what is causing things. At the other extreme is the Rob Peters instrumentalist point of view. Peters said that we can never know mechanism (he told a colleague of mine at McGill University that we don’t know that inheritance works by genes and that genes are just a human construct). His solution is a bunch of regression – variable x is related to y. And if we know y then we can predict x. For both of the readers who have followed my work closely, it will come as no surprise that I take a somewhat out of the mainstream stance – namely that mechanism is a nice-to-have, prediction is a must-have. Or a more nuanced version is that mechanism is a lot more slippery and less black-and-white than we ecologists like to give it credit for.
Before arguing my case, I want to detour to an example enough outside of our field that we won’t get emotional about. I was put on this topic by a great post at the Mermaid’s Tale blog. They talk about the question of predicting which individual humans will contract a particular disease. Obviously something of high practical relevance but also something that really tests the progress of medical science. Based on some papers mentioned, I am going to abstract the problem a little bit to predict the height of an individual since this is something we know a great deal about. One can imagine several approaches to tackling this:
- Big data – collect a bunch of data about an individuals geographic ancestry (different groups of people do have different average heights), per capita GDP in country of birth at time of birth (diet quality influences height), gender, etc. Build a regression model
- Reductionist – use QTL mapping or more modern methods to identify which genes most strongly influence height, assess the presence or absence of these genes in an individual and predict height.
- Phenomenological – Use Galton’s regression approach of looking at mid-parent height and heritability.
All of these methods have been used to predict human height. First question- which of these models is most “mechanistic”? Second question, which of these models is most predictive?
Most mechanistic? – Most ecologists would say #2, the reductionist approach is most mechanistic. This is because of our (trained) intuition that mechanism comes from smaller things, not things of the same size (our parents of #3) of larger (the environmental context of #1). But is it really? The chain of causality from gene presence/absence to adult height is incredibly complex (and inherently a limited part of the picture – diet really does matter). Does approach #3/phenomenology really tell us the same story (genes and environment) but at a much more useful way (regression and variance around the line). And is not #1 in some ways more comprehensive, covering both genes and environment as causal factors? I have argued along with Jeff Nekola that ecology is really causing itself grief by ignoring mechanisms right in front of our faces because of our reductionist biases.
Best prediction? I couldn’t find a paper that actually takes route #1 (although its easy to find tables for average height by ethnicity and gender which takes into account 2 of the 3 factors I mentioned), but there was a great paper that held showdown between #2 and #3. #3 won walking away. #2 (despite an extraordinarily extensive effort) explained 4-6% of variance. #3 explained 40% of variance. A more recent paper using 100s of thousands of SNPs (yes that’s right 400,000 regions of DNA) was only able to predict 15-30% of height in the test data set. Galton’s Victorian era regression is still undisputed champion!
A similar result was found recently in the specific question of predicting future diseases in individuals. What they found is that for a low-frequency, more specialized diseases like Crohn’s the genetic SNP approach worked better but that for common diseases like heart disease, family history worked better.
Before returning to ecology and prediction, I want to return to meteorology, which I cited previously as a model for prediction. As I explained the 1-3 day predictions are highly reductionist models that use fluid flow equations and have improved due to better data input and smaller grid sizes. A clear victory for the mechanistic reductionist approach. But much of our improvement in longer term forecasts (e.g. monthly, yearly) have come from a completely different source – raw naked correlation! The major breakthrough was the discovery of teleconnections or specifically when weather at one location is influencing weather at a far away location. The El Nino or ENSO was the oldest and best known. Then the Pacific Decadal Oscillation was discovered from the studies of salmon productivities on the Pacific coast of the US (it is a 20-30 year cycle). But the major breakthrough was the paper “Classification, seaonality and persistence of low-frequency atmospheric circulation patterns” by Barnston and Livezey in 1987. This paper was nothing more than a giant principle component analysis (across space and time and therefore called empirical orthogonal function analysis by meterologists) of spatially gridded timeseries of atmospheric pressures. Out of it popped half a dozen major teleconnections with frequencies ranging from months to decades. Although some later mechanistic understanding of why these teleconnections occur has been provided, current models are poor at accurately reproducing many of these patterns. But understanding these spatiotemporal correlations let us say things like the frequency of intense snow events in the NE US (bit of a personal interest in that right now) is strongly regulated by the PNA and NAO patterns. So monitoring and predicting these half dozen patterns has greatly produced our longer term (climatological) forecasting almost entirely because of to empirical correlation (#3 above). A victory for the phenomenological/big data approaches.
As an aside, I just want to note that physics has nothing like ecology’s expectation of mechanism to be reductionist. We still have no reductionist mechanism for gravity (gluons and other imaginary particles are hypothesized but not tested). Indeed all we really have is a phenomenon.
Now back to ecology.
I’m not sure what the exact analogies to #1-#3 are in ecology. But lets try for one case – predicting species abundance around the globe:
- Big data – throw in NDVI (a satellite proxy for productivity), mean annual temperature, temperature seasonality, water balance and maybe a few other variables and develop a regression model
- Mechanism – use coexistence theory or other theories of species interactions to predict diversity from first principles
- Phenomenological – not sure exactly what this looks like – maybe predict bird diversity from tree diversity or insect diversity?
As the reader will probably know, all three of these have been done. In terms of accuracy, by and large #3>#1>>#2. Still think we need to be reductionist for prediction?
To my mind the hierarchy is simple:
- accurate prediction>mechanism
- knowing mechanism>ignorance about mechanism
If you adopt this view then the big data (#1) and certainly the phenomenological (#3) methods become viable and often the quickest routes to prediction. The main argument against #1 and #3 as predictive mechanisms is that because they are missing mechanism they cannot accurately extrapolate into new conditions (for example see Dunham, Arthur E., and Steven J. Beaupre. “Ecological experiments: scale, phenomenology, mechanism, and the illusion of generality.” Experimental ecology: issues and perspectives. Oxford University Press, New York, New York, USA (1998): 27-49. -I think they’re wrong but it is a provocative read I recommend to every grad student). I think this argument is given a lot more weight than it deserves. First, who says there is extrapolation – in the example of global patterns of diversity there was no extrapolation. Second, yes, in true extrapolation the regression approaches can fail – but so do the mechanistic ones often! Ecology is highly contingent and when you change contexts enough, regression relationships fall apart but so do basic assumptions about what the most important processes are.
So in summary, I would argue that there is more than one way to make a prediction. And they’re all viable routes. Mechanism is a nice-to-have but by no means a must-have for advancing science. Or as I prefer to think about it, the problem is not so much pursuit of mechanism but pursuit of reductionist mechanism (explaining everything by smaller things). #1 and #3 are arguably as if not more mechanistic than #2 once you let go of the reductionist paradigm. People will say #2 is (in either the height or diversity examples) more mechanistic because it is more getting at ultimate causes. But really genes and species interactions are both pretty so “ultimate” they lack much direct link to the topics at hand – the links in the regression are much more obvious.
I know this is a non-mainstream view and I’m expecting a lot of discussion (with Jeremy at the lead). Which is great. But please – intelligent comments. Don’t argue by religious fervor and just say “reductionist mechanistic predictions work better” (please specify by what measure, give specific examples) or just say “its not real science if it doesn’t have reductionist mechanism” (go tell that to the physicists and the climatologists and the epidemiologists).