So I have been arguing that in order for ecology to progress as a science, we need to stick our neck out and make risky predictions that might actually be wrong (here and here). That’s all fine and good, but the obvious question is how to make such risky predictions.
In particular, many comments on previous posts have raised the issue of whether the predictions are mechanistic or phenomenological. The mainstream view in ecology is very reductionist – to explain communities we have to make our explanations in terms of populations – to explain populations we have to make our explanations in terms of individual behavior and physiology – to explain behavior and physiology we have to look at endocrine systems, proteins, etc. With evolution mixed in there somehow. This is almost a holy doctrine in ecology. And extended to prediction, it says we have to make predictions that build up from the little pieces with a thorough understanding of what is causing things. At the other extreme is the Rob Peters instrumentalist point of view. Peters said that we can never know mechanism (he told a colleague of mine at McGill University that we don’t know that inheritance works by genes and that genes are just a human construct). His solution is a bunch of regression – variable x is related to y. And if we know y then we can predict x. For both of the readers who have followed my work closely, it will come as no surprise that I take a somewhat out of the mainstream stance – namely that mechanism is a nice-to-have, prediction is a must-have. Or a more nuanced version is that mechanism is a lot more slippery and less black-and-white than we ecologists like to give it credit for.
Before arguing my case, I want to detour to an example enough outside of our field that we won’t get emotional about. I was put on this topic by a great post at the Mermaid’s Tale blog. They talk about the question of predicting which individual humans will contract a particular disease. Obviously something of high practical relevance but also something that really tests the progress of medical science. Based on some papers mentioned, I am going to abstract the problem a little bit to predict the height of an individual since this is something we know a great deal about. One can imagine several approaches to tackling this:
- Big data – collect a bunch of data about an individuals geographic ancestry (different groups of people do have different average heights), per capita GDP in country of birth at time of birth (diet quality influences height), gender, etc. Build a regression model
- Reductionist – use QTL mapping or more modern methods to identify which genes most strongly influence height, assess the presence or absence of these genes in an individual and predict height.
- Phenomenological – Use Galton’s regression approach of looking at mid-parent height and heritability.
All of these methods have been used to predict human height. First question- which of these models is most “mechanistic”? Second question, which of these models is most predictive?
Most mechanistic? – Most ecologists would say #2, the reductionist approach is most mechanistic. This is because of our (trained) intuition that mechanism comes from smaller things, not things of the same size (our parents of #3) of larger (the environmental context of #1). But is it really? The chain of causality from gene presence/absence to adult height is incredibly complex (and inherently a limited part of the picture – diet really does matter). Does approach #3/phenomenology really tell us the same story (genes and environment) but at a much more useful way (regression and variance around the line). And is not #1 in some ways more comprehensive, covering both genes and environment as causal factors? I have argued along with Jeff Nekola that ecology is really causing itself grief by ignoring mechanisms right in front of our faces because of our reductionist biases.
Best prediction? I couldn’t find a paper that actually takes route #1 (although its easy to find tables for average height by ethnicity and gender which takes into account 2 of the 3 factors I mentioned), but there was a great paper that held showdown between #2 and #3. #3 won walking away. #2 (despite an extraordinarily extensive effort) explained 4-6% of variance. #3 explained 40% of variance. A more recent paper using 100s of thousands of SNPs (yes that’s right 400,000 regions of DNA) was only able to predict 15-30% of height in the test data set. Galton’s Victorian era regression is still undisputed champion!
A similar result was found recently in the specific question of predicting future diseases in individuals. What they found is that for a low-frequency, more specialized diseases like Crohn’s the genetic SNP approach worked better but that for common diseases like heart disease, family history worked better.
Before returning to ecology and prediction, I want to return to meteorology, which I cited previously as a model for prediction. As I explained the 1-3 day predictions are highly reductionist models that use fluid flow equations and have improved due to better data input and smaller grid sizes. A clear victory for the mechanistic reductionist approach. But much of our improvement in longer term forecasts (e.g. monthly, yearly) have come from a completely different source – raw naked correlation! The major breakthrough was the discovery of teleconnections or specifically when weather at one location is influencing weather at a far away location. The El Nino or ENSO was the oldest and best known. Then the Pacific Decadal Oscillation was discovered from the studies of salmon productivities on the Pacific coast of the US (it is a 20-30 year cycle). But the major breakthrough was the paper “Classification, seaonality and persistence of low-frequency atmospheric circulation patterns” by Barnston and Livezey in 1987. This paper was nothing more than a giant principle component analysis (across space and time and therefore called empirical orthogonal function analysis by meterologists) of spatially gridded timeseries of atmospheric pressures. Out of it popped half a dozen major teleconnections with frequencies ranging from months to decades. Although some later mechanistic understanding of why these teleconnections occur has been provided, current models are poor at accurately reproducing many of these patterns. But understanding these spatiotemporal correlations let us say things like the frequency of intense snow events in the NE US (bit of a personal interest in that right now) is strongly regulated by the PNA and NAO patterns. So monitoring and predicting these half dozen patterns has greatly produced our longer term (climatological) forecasting almost entirely because of to empirical correlation (#3 above). A victory for the phenomenological/big data approaches.
As an aside, I just want to note that physics has nothing like ecology’s expectation of mechanism to be reductionist. We still have no reductionist mechanism for gravity (gluons and other imaginary particles are hypothesized but not tested). Indeed all we really have is a phenomenon.
Now back to ecology.
I’m not sure what the exact analogies to #1-#3 are in ecology. But lets try for one case – predicting species abundance around the globe:
- Big data – throw in NDVI (a satellite proxy for productivity), mean annual temperature, temperature seasonality, water balance and maybe a few other variables and develop a regression model
- Mechanism – use coexistence theory or other theories of species interactions to predict diversity from first principles
- Phenomenological – not sure exactly what this looks like – maybe predict bird diversity from tree diversity or insect diversity?
As the reader will probably know, all three of these have been done. In terms of accuracy, by and large #3>#1>>#2. Still think we need to be reductionist for prediction?
To my mind the hierarchy is simple:
- accurate prediction>mechanism
- knowing mechanism>ignorance about mechanism
If you adopt this view then the big data (#1) and certainly the phenomenological (#3) methods become viable and often the quickest routes to prediction. The main argument against #1 and #3 as predictive mechanisms is that because they are missing mechanism they cannot accurately extrapolate into new conditions (for example see Dunham, Arthur E., and Steven J. Beaupre. “Ecological experiments: scale, phenomenology, mechanism, and the illusion of generality.” Experimental ecology: issues and perspectives. Oxford University Press, New York, New York, USA (1998): 27-49. -I think they’re wrong but it is a provocative read I recommend to every grad student). I think this argument is given a lot more weight than it deserves. First, who says there is extrapolation – in the example of global patterns of diversity there was no extrapolation. Second, yes, in true extrapolation the regression approaches can fail – but so do the mechanistic ones often! Ecology is highly contingent and when you change contexts enough, regression relationships fall apart but so do basic assumptions about what the most important processes are.
So in summary, I would argue that there is more than one way to make a prediction. And they’re all viable routes. Mechanism is a nice-to-have but by no means a must-have for advancing science. Or as I prefer to think about it, the problem is not so much pursuit of mechanism but pursuit of reductionist mechanism (explaining everything by smaller things). #1 and #3 are arguably as if not more mechanistic than #2 once you let go of the reductionist paradigm. People will say #2 is (in either the height or diversity examples) more mechanistic because it is more getting at ultimate causes. But really genes and species interactions are both pretty so “ultimate” they lack much direct link to the topics at hand – the links in the regression are much more obvious.
I know this is a non-mainstream view and I’m expecting a lot of discussion (with Jeremy at the lead). Which is great. But please – intelligent comments. Don’t argue by religious fervor and just say “reductionist mechanistic predictions work better” (please specify by what measure, give specific examples) or just say “its not real science if it doesn’t have reductionist mechanism” (go tell that to the physicists and the climatologists and the epidemiologists).
Hi Brian,
Thanks for the insightful post. I often struggle with this issue of mechanistic versus correlative science. I wonder, is just explaining more variance really our goal? For example, in your height example, the mid-parent height method is “best”, but what have we learned from that method? Mid-parent height is likely just a proxy for numerous other mechanisms. If we were interested in height for some reason (e.g., it provided some adaptive advantage and I wanted to increase that advantage throughout the world), wouldn’t we want to be more clear about what was actually causing height?
I am struggling with this issue right now in the context of climate-envelope models. Climate-envelope models are a perfect example of a big-data correlative approach. However, many recent papers (including one of your own to some degree) have shown that much of the predictive ability of these models can be attributed to chance associations between species distributions and climate variables (where spatial autocorrelation is the likely culprit of those chance associations). Hence, these models appear to have strong predictive ability, but are likely misleading if conditions change (as we expect) and/or they are used outside of the study area where they were developed. I would argue that in this case, the big-data correlative approach hinders scientific progress (or at least the overuse of climate-envelope models is hindering scientific progress).
I was somewhat convinced by your example in climatology, but is the goal of ecology the same as climatology? Or, would climatology not benefit from learning the mechanisms behind the correlations? Long story short, I see scientific progress in ecology moving as a two-step process: first correlation, then causation. Hence, all three methods of prediction you describe are useful, and too much of any one method slows progress.
I agree with your first two posts, especially our need to focus more on r-squared and effect size. I just think in some instances these metrics (and ones like them) can be misleading.
Thanks to you and your collaborators for spending your time helping us all learn about these issues through this blog, For better or worse, Dynamic Ecology has me addicted to ecology blogs 🙂 !
-Chris
I’ve got no problem with “correlation then causation” as an approach to science. And I definitely agree r2 is not the only metric. As just one example I always argue for simpler regression methods that more clearly highlight the relationships between variables is better than a 0.05 increase in r2 in a black-box regression. And I’ve got a paper out there on the allometry of optimal foraging with Gary Mittelbach where I argue that a mechanistic model with an r2 of 0.5 is way better than a purely correlative model with an r2 of 0.7 (or something around those numbers). Its a trade-off, and the answer probably varies form person to person and with objective and usually lies somewhere in the middle.
WRT to climate envelope models, I’ve got to agree with you. But I can save myself by pointing out that the r2 is close to zero when spatial autcorrelation is properly accounted for, so they’ve been tried and failed as predictive models.
The only place, I’m really going to disagree with you is the mid-parent height. That is a pretty clear statement of inheritance as mechanism. I would argue in fact it is a lot closer to mechanism than a few thousand SNP (repeating DNA sequences that are markers for DNA regions) represented by approach #2. The ONLY reason to chose #2 is if you have a reductionist view (i.e. explanation by smaller things is inherently better). And to your example, mid-parent height tells us a lot more about what to do to increase height in future generations (i.e. a selective breeding program frighteningly known as eugenics in humans but daily practiced in farm animals) than the SNPs or QTLs do.
Nice post Brian, worth the wait! I have a few random thoughts, and I’ll try to avoid repeating things I’ve said in other contexts (well, mostly).
Re: your three examples of how to predict height, you ask which is most mechanistic. I’m tempted to say “none of them”! They all seem like regressions to me, just with different predictor variables. Even in the genetic case, it’s just a regression. Think of Fisher’s notions of average effects and marginal effects of alleles. I think the distinctions between those three approaches have less to do with how “mechanistic” they are, and more to do with things like how “proximal” or “distal” the chosen predictor variables are to the dependent variable in some hypothetical network of causes. The notions of “screening off”, and of correlations among predictors, are relevant here too. Maybe regressing on the midparent value works well because it either screens off other causal factors, and/or is correlated with other predictors.
Re: Robert Peters, all I can say is at least he was clear and consistent in his instrumentalism! A snarky philosopher, David Stove, once said of something John Stuart Mill wrote, “But this is just Mill doing us his usual service, making important mistakes *clearly*.” One might say something similar of Peters’ remark that genes are just a human construct. And of course, in his Critique for Ecology he writes off evolution by natural selection as a tautology. I’m kind of morbidly curious about his other views. Did he, like many of Galileo’s contemporaries (and like a few instrumentalist philosophers even today) doubt that we really “see” through a telescope? Or a microscope? His was a unique mind in ecology, and one I admit I find it difficult to understand.
Your remark that we don’t have a mechanism for gravity is a very good one. Historian of science Peter Dear has an interesting little book, The Intelligibility of Nature, which talks about how what counts as an “explanation” has changed over the history of science. Initially, Newton’s inverse square law was indeed widely criticized because it implied “action at a distance” in the absence of any mechanism.
I didn’t know the story of teleconnections, that’s interesting.
We’ve talked a lot in the past about how pretty much any parameter in any model can be regarded as mechanistic from one perspective, and phenomenological from another (e.g., consumer conversion efficiency, usually specified by a single parameter in simple “mechanistic” consumer-resource models, is actually a summary of the hugely complex underlying physiology of digestion and assimilation). So I’ll just note that in passing for any readers who’ve missed our earlier discussions. And similarly, we’ve talked a lot in the past about cases where building more explicitly mechanistic models does seem to have improved our predictive abilities (modeling population cycles is one obvious one; see the comments on http://andrewgelman.com/2012/01/the-last-word-on-the-canadian-lynx-series/). So again, I’ll just note that for the record. As you say, there are cases where being explicit about the mechanisms aids prediction, and cases where it doesn’t. Not sure how much would be learned by trying to count up instances of each sort of case, either in ecology or in other fields. Not much, probably.
The overall message I take away from your posts is that making good predictions is a very pragmatic business. Being wedded to one approach on grounds of principle–highly mechanistic models, purely instrumentalist Peters-type regressions, whatever–is likely to lead to bad predictions in many circumstances.
Don’t fall off your chair Jeremy, but I agree with everything you said! 🙂
If you call the QTL/SNPs regression models (which I agree you might), then it is an interesting exercise to think through what would really be a mechanism for height. Specific genes->specific proteins->to specific roles in cells->increased rates of cell division–>…? it kind of boggles the mind! Even the seemingly obvious link of good diet –> greater height, goes through an awful lot of complex processes that are not well understood. Exactly how useful are such mechanisms? and how predictive are such things?
And yes, I would say my two favorite words for prediction (or for science for that matter) are “pragmatic” and “balanced” (answer usually lying in the middle, no extremes). It can seem rather boring and unexciting that way, but I argue it works!
No, I’m still sitting in my chair just fine. 😉 I actually can’t think of any cases where you and I have had really serious disagreements. Especially in this case, where my comments are mostly of the innocuous, chin-scratching variety.
Very thought-provoking post for me, given that I am (was? Probably wouldn’t go that far. Yet?) a strong advocate of the reductionist / mechanistic approach. I need to think about that a bit more.
For now I think I get your argument in a prediction scenario. But prediction is only one step. Often you want to change or prevent what you predicted, and I guess for that you really need a deep mechanistic understanding of the processes behind your patterns. Take your height example in your last comment: if you would know all or at least some of the detailed mechanistic steps leading from alleles to a condition X incl. gene-environment interactions and stuff, you would know how and how much you would have to turn which screws to step in and do good. What medication to develop, where intervention would be most effective etc. I don’t think a pure big data / regression approach can ever give you that. A similar argument holds in Ecology when you for example want to prevent or mitigate harm to a population or ecosystem.
By the way, I don’t really see a difference between big data and the third approach, the first is just regression writ large, or?
Thanks Arne
I want to repeat I am not opposed to mechanism – I want to know why as much as the next person. But I am not as convinced it is as necessary as it has been made out to be. What if we could all just agree mechanism was fun?
Even in your example of drug discovery, how many drugs have been created de novo to target a particular gene/protein vs how many have discovered by random trial and error? The tide is changing but the vast majority are still trial and error. A lot of conservation’s most effective tools are also phenomenological island biogeography/species area relationship, niche modelling for climate change. Certainly there are cases like conserving shrikes by knowing they like fenceposts is much more detailed, but is that really a mechanism or a correlation? Its a slippery line – much more slippery than we give it credit for. Could you expand your example – what kind of mechanisms do you see useful in conservation?
I guess the difference to me is that #3 is in between #1 and #2 in terms of mechanism (i.e. in some way more mechanistic than #1). The breeders equation and most other equations of quantitative genetics directly derive from that insight of treating things as a linear regression. It also has a very direct 1-1 mapping between two variables of interest, not so different in that regard to say the inverse square law for gravity of Newton (albeit more unexplained variance). In short it involves a lot more thinking about what explains what where as in real big data scenarios, the machine does the thinking about which variables to include. I’m not being particularly clear on this, but I hope I’ve clarified a little.
Hi Brian,
I didn’t take your post as being anti-mechanistic (as you now and Jeremy pointed out, “mechanistic” is somewhat a matter of your perspective). It just challenged my rather habitual thinking, and one reaction to that is of course often opposition 🙂
One particular scenario where regressions between for example an environmental variable and community variables break down is when you have alternative stable states in community composition. Basically, you get for one and the same value of an environmental driver contrasting communities. Knowing the environmental parameter value doesn’t really help you much anymore when you want to predict the community state at a site or forecast into the future. Alternative stable states are rather common, I argue, and a result of species interactions or environment-species interactions. An understanding of the mechanistic processes can save you money and time when trying to do something about a undesired state or even make actions feasible at all.
In restauration ecology alternative stable states are more and more recognised as important, I believe. How much that concept is really worth for the practioner on site, I don’t know. Probably less than I like to think (but see http://islandpress.org/ip/books/book/islandpress/N/bo8023657.html for some
examples).
I agree with the overarching sentiment that we need more predictive modeling in ecology. But, I think you’ve drawn “predictive” vs. “mechanistic” as a bit too distinct from one another. For example, is the relationship between parent and offspring height mechanism or pattern? I think the answer depends on the context in which we wish to use that relationship: For a physiologist or molecular geneticist, it might be pattern — just a correlation — to be explained by other mechanisms. For a population ecologist or evolutionary biologist, it might be a mechanism for explaining patterns at the population level.
As a modeller, I often ask myself “what mechanisms do I include here?” and “what pattern should I get right here?” almost interchangeably. Whether I start with one or the other purely depends on context.
My two cents: any trustworthy predictive model must be necessarily capable of shining at least a glimmer of light on mechanism, or be built on some mechanistic ideas of how things work and what data should be fed into the model.
Also, there’s a really big and really important distinction to be made between the predictive performance of mechanistic models in general, versus the predictive performance of mechanistic models that are built upon incorrect or incomplete mechanisms. Some mechanistic models simply aren’t meant to be predictive in the same sense as other data driven models. If a model gets qualitative patterns right, or simply clarifies how certain mechanisms leads to certain patterns, it might be somewhat of a mistake to disregard such a model simply because it performs poorly at a task it wasn’t built for.
The examples discussed above highlight something really important IMO — that both approaches (understanding mechanism vs. making good predictions) are valuable and benefit one another. We make better, or at least more trustworthy (or at the very least, more inexpensive) predictions when we have a sense of the mechanisms involved. On the other hand, predictions are going to provide a reality check to let us know if we’ve got it right, or are at least on the right track, and can be an excellent source of ideas for how to go about improving our mechanistic understanding of how nature works. Which one is more valuable? It depends on the context. I think Brian rightly points out that in many instances, if we could only have one or the other, we probably want good predictions. I’m arguing that, in many of those situations, we need at least some mechanism-based guidance to create those models. We need both (at least eventually) to do good science. 🙂
One additional comment to further illustrate that context matters:
Looking at the paper on victorian vs. genomic comparison of heights, you could call the better performance of the victorian approach a win, but only in certain contexts, e.g., which does a better job at getting heights right. But, I bet there are more than a few approaches that would blow both out of the water! Ex: a regression-based model using all the long bone measurements of an individual, or a precise and accurate length of their shadow along with similar high quality time of day and lat/long data. But change the ultimate reason (context) for asking the question in the first place,and we might get a new winner. The victorian approach told us inheritance matters and maybe environment does too. The genomic approach tells us which genetic material might be involved, and that (at least those particular) parts of the human genome don’t 100% determine our heights. Under different scenarios, either of these might be the more useful model. After all, if all we wanted to do was find the most effective way of determining a persons height, we’d just measure them. 😉 For anything more complicated than that, “the best wen can do” is going to depend on what we can measure, what we know we don’t need to measure, and (importantly!) the ultimate goal motivating us to ask the question in the first place. 🙂
You are right – anthropology journals (especially forensic anthro) are full of height vs. long-bone regression models. Context and goals definitely matter!
Hi Paul
I didn’t mean to imply mechanistic as a contrast to predictive. I agree fully with your bolded statement that its hard to do good work that is purely mechanistic without something useful coming out that is predictive or vice versa.
I think I’m really taking on is how ecology has defined mechanistic in the past.
I agree that there are types of models (May called them strategic) that are just intended to increase understanding and are at best qualitatively predictive. I don’t think such models are useless, but I think I am saying maybe we’ve leaned on them a bit too much in ecology. They are in a certain sense rather safe and unrisky.
I wonder if part of what drives the dichotomy is two different goals for scientific knowledge. One is very applied, the other theoretical. Tacit in the assertion of the importance of prediction is the importance of applied knowledge. In that way regression and it’s younger more emotionally complex sibling machine learning are a useful paradigm of inference. Imagine a marketer using big data. Do they care why people who like to drink latte’s also shop at Ikea? No, studying the mechanism is a waste of time. They just care about how they can target Ikea advertisements to me. Concern with mechanism also has it’s own tacit assumptions. I think ecologists concern over mechanism isn’t just a search for the mechanisms in a particular system, but universal mechanisms. A trivial example might be wanting to know how long it will take a ball to from different heights. A regression approach would be to run around measuring a whole buch of balls falling from certain heights and and regressing time vs height, and then extrapolating to the height we want to predict. But because we have good understanding of the underlying mechanism for how long it takes to fall from a given height (gravity) we can just use this universal mechanism to predict the time.
Right now ecology has nothing like the ball example. But I’m not sure that universal prediction is a good reason to engage in theoretical understanding of mechanism. I doubt that Newton sought to elucidate the laws of gravity to send rockets into space, that Watson and Crick wanted to create personalized medicine with their discovery of DNA. Instead theoretical discovery lead to many unpredicted applied byproducts. My point is that you draw this distinction between A). mechanistic/theoretical understanding and B). practical prediction ability when I don’t think it really exists, they are fundamentally two different goals that aren’t by necessity tied together.
A). can lead to B), but B) doesn’t need A). So why even get bogged down with the debate between regression and mechanistic models? If prediction is what we care about because we want to predict things like changes to ecosystem services with climate change (an applied task), who cares why? In the end we want to save those services, just like the marketer wants to sell me Ikea furniture and neither really cares why, just as long as the outcome is the same. It seems like we’d be better off just acknowledging that mechanism is pretty poorly understood, so let’s go with the better approach and revisit the question of mechanism later.
Thanks Ted – lots of food for thought.
You are right that big data approaches originate in a world that is completely devoid of caring about mechanism. And this can make a lot of sense in an applied context
I guess the only thing I might disagree with is that sometimes the challenge to make predictions can be really healthy for basic science and the search for general mechanisms too (your balls falling example is a case in point)
Hi Brian,
This is a really interesting contribution and part of a great series of posts. And in reading the comments, I gained the great insight that Rob Peters was way ahead of his time in the field of internet marketing!
From both your original post and from the comments, I think the central problem that we ecologists still struggle with is the problem of circular causality, which Hutchinson described pretty well all the way back in 1948. As you point out, the reductionist approach looks to lower levels and smaller scales for the “mechanisms” to explain ecological phenomena, but in hierarchically organized systems like ours (even if the hierarchy is messy), the higher level and larger scale constraints are just as important as they may constantly modify the behavior of the lower level mechanisms. Thus, while we tend to think of a model moving from the bottom up as mechanistic and one looking from the top-down as phenomenological, both can potentially illuminate important ecological processes. And as you and Jeremy have both pointed out, one model’s mechanistic parameter is another’s phenomenology.
In general, like you, my feelings about prediction and model building more generally are quite pragmatic – we should use what works for any particular situation. But while I agree that there are situations in which prediction is key, more generally I think the scientific endeavor succeeds best when we try to jointly maximize both predictive ability and understanding. With really successful models for some systems (perhaps your example of short term weather forecasting), the two go hand in hand, in other cases there are very definite trade-offs. We can in some cases gain more insight from models that make less accurate (or less precise) predictions. So predictive power is one important criteria for evaluating scientific models, but insight is another one, and it is one that is probably a lot more difficult to quantify – especially since we can learn from both the “successes” and “failures” of models, but we only increase predictive power when they are “right.”
Great comments. I especially like your discussion of bottom up vs top down mechanism.
I certainly agree that prediction and understanding are both important goals. I think my motivation for my posts (which are clearer thanks to your clear writing) are: a) in ecology we’ve swung too far to favoring understanding; and b) I don’t think we (or any science?) has come to terms with what to do in areas that are really complicated and chaotic and therefore by definition not tractably understandable by current mathematical methods (be it turbulent fluid flow including long term atmospheric dynamics or an ecosystem). Can we understand these systems ultimately? Is it much more realistic to just try to predict them?
I know what you mean, and I share a lot of those feelings. In some ways, what I was trying to say is that hopefully, if we find models that are highly predictive, hopefully they point towards some points for further understanding (maybe in their underlying assumptions).
Cheers,
Drew
Brian,
Reading through the comment thread got me thinking about predictability more generally. How well do you think we understand the limits of predictability that exist for the kinds of systems we’d like to make prediction about? Or, before embarking on an information gathering study to try and parameterize a model to make predictions a system, is there a way to use less info to simply ask whether or not this is going to be a predictable system?
For example, we know of very noisy systems and a few chaotic systems, so clearly some systems just aren’t going to be predictable — at least not as predictable as we might like. Has anyone tried to address predictability itself by a means other than model-predict-validate-evaluate? I can think of examples like those in Lande, engen & Saether but these all strike me as a “model it and see if it works” approach. Has there ever been a review that attempted categorize different systems (populations, or communities, or both) and somehow quantify which seem to be more or less predictable? It seems like there is some room to just tackle the question “Is this going to be a very predictable or unpredictable system?” without attempting to make and validate predictions.
-Paul
PS: While I suspect this approach won’t give slam dunk answers, it would undoubtedly be interesting to see if predictability varied with habitat types, net productivity, trophic levels, diversity, etc.
Paul – good questions. I would probably have to say a clear understanding of what is not predictable is just as good good prediction ability in terms of moving science forward. And, no, I don’t know of clear work in this area. There is the obvious fact that chaos has a very mathematically well worked out theory of exponentially increasing effects of small initial differences (with the rate of exponential growth set by the Lyapunov exponent which can be measured empirically). So one could imagine a program of measuring Lyapunov exponents. However, I think many systems have multiple causal factors and we have not yet even written down the dynamical equations covering the system, let alone measuring whether it is chaotic or not, and if so how big the Lyapunov exponents are. But we do know things that make chaos more likely including for example discrete-time-like population dynamics, very high r (reproductive rates), etc.I suspect a real analysis would have to go beyond chaos theory. Not unlike weather – chaos theory points to smaller grids and more accurate data, but it only carries us out a few days. Predictions longer term tend to need methods beyond the dynamical systems approaches exactly because of chaos. Two factors involved in predictability might include role of dispersal (dispersal is stochastic so highly dispersal-dependent systems would be less predictable), large environmental variability and systems that are strongly forced by the environment (although then a conditional prediction conditional on the environment is possible).
But in all I suspect this is one of those ignored questions that shouldn’t be ignored!
Dear Brian,
It strikes me that you missed the importance of your option #2 — making mechanistic predictions. If we make a mechanistic prediction and test the prediction, we test our understanding of the mechanisms. If a mechanistic prediction is rejected, then our understanding of the mechanisms is wrong and needs to be revisited. Thus, only by using your option #2 do we have a chance of learning something new or reinforcing our understanding of mechanisms that work. It’s called the scientific method and, son of a gun, it works.
peace , , , , ,
roger . . . .
Yes, I too find sarcasm peaceful?!
Seriously, if you want to engage in any meaningful way, I’d be happy to do so.
Otherwise, I’d refer you to the last two sentences of the OP: “But please – intelligent comments. Don’t … just say “its not real science if it doesn’t have reductionist mechanism” (go tell that to the physicists and the climatologists and the epidemiologists).”
Related to the post: this new Ecosphere paper (open access) from a bunch of sharp people (Kim Cuddington, Alan Hastings…) argues for process-based models to manage ecosystems in a changing world, as opposed to either phenomenological statistical models, or detailed simulation models.
http://www.esajournals.org/doi/abs/10.1890/ES12-00178.1
Hi Brian and all, one of the things that I struggle with is the distinction between prediction and understanding. And it’s cropped up here several times. As if a mechanistic model (and the understanding it contains) can somehow be assessed without prediction. And maybe it can but I’m just trying to sort out how. If latitude predicts species richness very well (it doesn’t but let’s say it does) and a mechanistic model arising from say, metabolic theory, doesn’t, it seems to me that the latitude-species richness model has provided more understanding. My impression is that predictions are only seen as inherently valuable in an applied setting but I’m convinced that they are the only way to assess the understanding contained by a model, mechanistic or otherwise. All this to ask – is there a way to demonstrate understanding other than making predictions that are better than those that you would make by chance? Further, isn’t there only one way to demonstrate increased understanding – by making a better prediction with the new model than you could make with the old model?
One exception might be that you can increase understanding by ‘unpacking’ the box a little bit. For example, you find A predicts B very well. But you think C and D are more proximate causes of change in B and so you develop a model containing C and D as drivers of B but it doesn’t predict B quite as well as A does. However C and D do a good job of predicting A and so you can make connections from C and D to A and then to B in a way you previously couldn’t – you’ve added another layer of understanding without making better predictions of B. But even here we’ve increased our ability to predict A using C and D so even adding the extra layer of understanding has required prediction.
I guess I am struggling to understand why prediction isn’t seen as absolutely necessary to the practice of science – shouldn’t every claim of understanding come with successful predictions? Best, Jeff Houlahan
Hi Jeff – great comments. I agree with everything you’ve said. You’ve just said it more clearly than I have! Let me see if I can clarify my own thinking
First, mechanism is a slippery word in ecology. I think it is important to distinguish between mechanism sensu strictu (what I’ve called reductionist mechanism) which is often what is implied in ecology when mechanism is used without qualification. I have argued (and Jeremy and other posters) nicely argued for mechanism sensu latu where there are many possible sources of mechanism (e.g. the system in which an organism is embedded, various parameters that can be unpacked to detail or treated phenomenologically, etc). Understanding is another slippery word, but it mostly equates to “learning more about mechanism” and therefore is just as slippery as mechanism.
Once you have opened up mechanism to the sensu latu meaning, then yes, I completely agree, prediction is not just for applied scientists. Indeed it is what I think is the key missing ingredient in basic ecology. Taking our understanding and using it to make risky predictions and then testing them is the essence of science. This is basically what Platt said. It is certainly what Lakatos says. But you just have to look at the example of the mid-parent height usage in predicting offspring height to see why you need a sensu latu meaning of mechanism (and understanding) to say that our understanding has been improved by this successful prediction. A reductionist sensu latu interpretation of mechanism would say mechanistic understanding has not been enhance by this phenomenological prediction – it is just a parlor trick for the applied world. But I think even inherently applied scenarios like the weather prediction of my last post shows that risky prediction for applied purposes still greatly improved basic science understanding.
Flipwise, I think ecology makes a lot of “understanding-based predictions” that don’t really predict much and don’t really advance science much. Many of the strategic models that May speaks of falls in this category – they only claim to make qualitative predictions. The Lotka-Volterra competition model and predator prey models are like this in my opinion (there may be competitive exclusion, there may be oscillatory cycles). Such models and predictions were a great starting point many decades ago, but these models with their very weak predictions have not furthered our understanding beyond that. This was really the subject of my first post on the evils of ANOVA (factor X has a significant but unspecified effect on factor Y). Your paper on compensatory dynamics is a great example of this. To me this was a more risky prediction that “failed” and led to understanding (we need to incorporate environmental variability as a very strong force), but many people I have talked to have treated this as an “unreasonable” test of existing theory or one that went beyond the mechanisms incorporated in the models that made the prediction. I would say the latter is exactly the point. To me a failed prediction is a great opportunity for science and understanding.
Thanks for the insightful comments and the opportunity to clarify myself!
More food for thought:
Ecosystems: Time to model all life on Earth. Drew Purves, Jörn P. W. Scharlemann, Mike Harfoot, Tim Newbold, Derek P. Tittensor, Jon Hutton, & Stephen Emmott. Nature, 17 January 2013. doi: 10.1038/493295a
Yes, I have an old post on that piece.
https://dynamicecology.wordpress.com/2013/01/16/the-road-not-taken-for-me-and-for-ecology/
Thanks for another good one Brian. I’m fairly well in line with your philosophical views on science generally.
Just a short comment that I’ve always found the mechanistic vs (assumedly) non-mechanistic dichotomy a bit odd and unreal. I don’t think much of what goes on, either as scientists or as humans more generally, is done without an attempted explanation in the back, or front, of the mind. All observations constrain the set of possible causes if we’re paying attention, especially if we’ve kept a catalog of previous experiences, informally or formally, singly or as a community. Some observations will do the job better than others though, and some minds will do the constraining better than others. But it’s all geared toward getting at cause and effect in the end.
Several people have mentioned the ‘fuzzy’ distinction between mechanistic and phenomenological and it’s reassuring that lots of other people are wrestling with, what seems like, a bit of an artificial dichotomy. So, it seems to me that the main divide is between models that have few (or maybe even no) requirements for inputs that simply come from observed patterns in the data to models that are completely based on observed patterns. So, you could have a model for the relationship between latitude and species richness that is simply based on the observed relationship between latitude and species richness and another that builds that relationship from the bottom up (i.e. using relationships between latitude and temperature, productivity, mutation rates etc.). The more a model uses observed patterns to make decisions about what drivers to include, functional relationships or parameter estimates (rather than sorting out drivers, functional relationships and parameter estimates from first principles) the further it moves along the continuum from mechanistic to phenomenological. So, it would be possible to have a phenomenological latitude-species richness model and a mechanistic latitude-species richness model. One reason I would end up preferring the mechanistic model is that it would make the probability that your inferences are mistaken due to confounding variables or chance correlations in your dataset much less likely (I think). That is, one of the ways we select our ‘best’ phenomenological models is how well they fit the data and so we invariably select models that are, in part, a reflection of chance correlations. Which, at least partly, explains why models constructed strictly based on statistical relationships almost always predict more poorly on a new set of data (i.e. data they weren’t constructed from). This wouldn’t be the case for models built from the bottom-up. So, it seems to me that mechanistic versus phenomenological is about how the model was built not what variables are in the model. I suspect I am stumbling on, what is, for many of you, well-trod ground but it’s starting to clarify things a little in my own mind. Best, Jeff Houlahan.
Pingback: Answers to reader questions, part 3: what we’d say to Congress, tropical vs. temperate systems, and more | Dynamic Ecology
Pingback: Ecologists need to do a better job of prediction – Part IV – quantifying prediction quality | Dynamic Ecology
Pingback: Bayesian state-space models lead to biased parameter estimates when applied to chaotic population dynamics (or so it seems) | theoretical ecology
Pingback: True models, predicitve models, and consistent Bayesian state-space estimators for chaotic dynamics | theoretical ecology
Pingback: In praise of exploratory statistics | Dynamic Ecology
Pingback: Why ecologists might want to read more philosophy of science | Dynamic Ecology
Pingback: In praise of a novel risky prediction – Biosphere 2 | Dynamic Ecology
Pingback: Friday links: does Gaad exist, stories behind classic ecology papers, evolution of chess, and more | Dynamic Ecology
Pingback: Friday links: blog comments = papers, SIR model vs. Beliebers, and more | Dynamic Ecology
Pingback: Prediction Ecology Conference | Dynamic Ecology