Trying to understand ecological data without mechanistic models is a waste of time

Note from Jeremy: this is a guest post from Greg Dwyer.

**************

Jeremy invited me to do a guest post because he saw my 2014 ESA Ignite talk, in which I argued that data are almost worthless without some connection to mechanistic models (Jeremy posted interesting comments on the session shortly after it happened).   That statement is a little stronger than what I actually believe, but the status of mechanistic models in ecology is so weak that it is hard for me to avoid losing my patience when confronting ecological research that is unconnected to an explicit model.   For the purposes of this essay, I am therefore going to stand by my talk title: Trying to understand ecological data without mechanistic models is a waste of time.   I think that the only caveat that I would add would be “except in cases where the underlying question is too trivial to be interesting.”

The basis of my argument is that ecological data are almost invariably influenced first by stochasticity, and second by strong nonlinearities, due to interactions between species, or to interactions between species and nutrients.   Understanding the effects of stochasticity and nonlinearities is very difficult without explicit mechanistic models. We also need models to know whether a data set includes enough information to characterize a stochastic process, but arguably that statement presupposes that we are using models in the first place.

If you disagree that models are essential for understanding data, maybe I can use other arguments to convince you of the usefulness of mechanistic models. First, there are many ideas and concepts that we cannot learn without models.   Theoretical ecology textbooks have many examples (my favorite texts are Mark Kot’s “Elements of Mathematical Ecology”, James D. Murry’s “Mathematical Biology”, and Leah Edelstein-Keshet’s “Mathematical Models in Biology”), but the examples that mean the most to me are the ones that helped me in my own work, or that collaborators and I learned on our own.

For example, a paper by Anderson et al. showed that any variability in a host’s infection risk, due to any mechanism whatsoever, will lower transmission rates and reduce epidemic severity, even if the mean transmission rate is unchanged (Anderson et al. 1986). Likewise, collaborators and I showed that adding stochasticity and a generalist predator to a model of a specialist pathogen produces wildly stochastic population cycles, whereas adding stochasticity alone produces only mild variation, unless the stochasticity is so high that it causes the host to go extinct (Dwyer et al. 2004). Finally, we showed that, in host-pathogen interactions, a tradeoff between average transmission and variation in transmission (as measured by the scale-less C.V.) can allow for the coexistence of pathogen strains (Fleming-Davies et al. 2015). The latter result is new enough that I am not quite sure that I understand it, but we have recently made simpler models of the problem, and those simpler models have helped.

Models can also make it possible for us to see that organisms in disparate taxonomic groups can have very similar ecology, a point first made by Si Levin years ago.   Anyone who works on animal diseases can give you examples of how models of human diseases are useful for a wide range of pathogens, but the best example from my own research is that we used models of HIV dynamics in humans to explain virus epidemics in insects (Dwyer et al. 1997). This latter work led to useful approximations, which we in turn used to compare different vaccination strategies in responses to smallpox bioterrorism (Elderd et al. 2006).

Models can also allow us to achieve a mechanistic understanding that is otherwise unavailable.   I think a basic idea in most ecological modeling is that we are attempting to explain phenomena at relatively large spatial and temporal scales, in terms of mechanisms acting at relatively small spatial and temporal scales. (Again I learned this idea from Si Levin, but I suspect that every applied mathematician has the same idea in their head. I am often appalled at the number of ecologists who do not understand this point, and for all that I do not want to point fingers, I will say that the empirical literature on animal population cycles often seems to be unaware of this point. Although not Jeremy, but see Brian’s very interesting response).   This approach is basic to almost every paper I have ever written, but the example that made the strongest impression on me came from models of spatial spread, which made it possible to use small-scale measurements of insect movement to explain large-scale patterns of virus spread (Dwyer et al. 1994).

In fact, the same models illustrate the usefulness of models for focusing our attention AWAY from complex mechanisms that may have little effect on the phenomenon of interest. For years, there was speculation in the literature that insect viruses are spread in bird poop, or on the feet of spiders, but the models showed that we don’t need those kinds of fancy mechanisms.   That’s not to say that there aren’t infectious virus particles in bird poop (there are), but it looks like we don’t need bird poop to explain spatial spread.

My former post-doc David Paez and I will soon submit a paper in which we use models to extrapolate from David’s experimental data to gypsy moth outbreaks. In the process of writing the paper, David asked something like, Won’t the model take over the paper, causing the reader to lose track of the three years I spent doing experiments? My response was, David, you wish! That is, the model is not that interesting without the data, but WITH the data, the model does a beautiful job of showing how a general theory can be used to understand a real problem. The reader may therefore only remember the model, but remembering the model may just cause them to actually remember the basic idea of the paper.

I am always interested in opposing viewpoints, or alternative ways of looking at this problem. I would therefore love to hear what you think.

30 thoughts on “Trying to understand ecological data without mechanistic models is a waste of time

  1. Hi,

    very interesting comment. I am curious about your opinion on empirical dynamic modeling. When you recover the mechanistic relationships between response and predictors based on long term data (or just a lot of data) in a equation-free approach, using phenomenological (statistical) modelling approach.

    Recent papers by Hao Ye and collaborators are making me think a lot about how useful are empirical models (in forecasting patterns for example).

    I don’t know if I got it correctly but the empirical approach would be the fundamentally distinct from the mechanistic approach, since you don’t have to have the knowledge of what really affects your response variables. But it could be very helpful in real and complex situations and not just “except in cases where the underlying question is too trivial to be interesting.”

    • Yes I had the same question. To focus the question even more, how would you respond to:

      Perretti, Munch, and Sugihara. 2013. Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data. Proceedings of the National Academy of Sciences of the United States of America 110:5253–7.

      Sugihara and his collaborators are making almost the opposite claim, that mechanistic models do not produce very useful predictions about reality, and that model-free approaches can be more useful in data rich situations…

    • There’s a major difference in the aims of the two approaches. Mechanistic models can make predictions, of course, but they’re usually used to test hypotheses. EDM is focused on forecasting, period, and in my view belongs in the huge class of “biologically dumb” statistical approaches. (This isn’t to say we can’t use EDMs, Bayesian networks, etc., to generate hypotheses.)

      Mechanistic models may be better at prediction too. Because they are based on state-space reconstruction, EDMs will never be able to predict the existence of new dynamical regimes. In contrast, if you have an accurate mechanistic model, you can (in theory) predict big shifts in dynamics.

      • My concern is actually with field biologists, not modelers. That’s important because my take is that the comments here, beginning with Rafael’s, are concerned with the question, how should modelers proceed when they are faced with highly limited data sets? My hope instead is to force field biologists to determine if they have enough data to construct interesting models, and if not, to ask, what data should I collect next?

        Completely dodging Rafael and Andrew’s questions, however, is disingenuous, because in fact I was pretty psyched to read Florian’s group’s response to the Sugihara paper (sadly I have not yet read the papers that Rafael was referring to, but the first author is now in my awareness, which is a start). I therefore clearly have a stake in the issue. I am not the one to comment on the relative usefulness of mechanistic and non-mechanistic models, but as a field biologist,I am dismayed by theoreticians who come into modeling projects with the notion that the data are fixed. If you are working with the right biologists, it ought to be true that more data can be collected, and that might solve your problems (Sarah, isn’t that your approach to immune-system modeling?). That’s an issue because mechanistic models tend to have very high data requirements. So, my answer to the question, are mechanistic models really better at explaining nature? is, if you can collect more data, they are! That may be more of a philosophical statement about how to do science than a solid answer, but hopefully it’s still interesting. For what it’s worth, I once argued the relative worth of mechanistic/non-mechanistic with Sugihara, and I at least had the sensation that he conceded that wisenheimers like me who collect their own data can run utterly wild with mechanistic models.

        As for Florian’s request that I define “mechanistic model”, there are models that I can identify as clearly NOT mechanistic (linear regression, for example), and models that clearly ARE mechanistic (agent-based models, for example), but there are too many in the middle that are hard to put in one category. I once spent 3 days outside of Vienna listening to Ulf Dieckmann and Sebastian Bonhoeffer try to define “virulence”. No. Thank. You.

  2. Hi Greg,
    Great post! I see Jeremy has gotten fresh recruits in this longstanding discussion on this topic!

    So I definitely agree with your identification that stochasticity and nonlinearities interacting are what make ecology/challenging interesting. And I also agree that mechanistic models can give us new insights (your examples included).

    My questions are:
    1) What counts as “mechanistic” – specifically is lotka volterra mechanistic? A lot of people I think would say yes. But I increasingly (especially in the n-species generalization) don’t find this mechanistic. That may be a side argument, but I increasingly doubt the value of the quasi-mechanistic models that many people claim are mechanistic (and which I see as distinct from some of the more truly mechanistic disease transmission models you describe).
    2) You already highlighted my post, but I really don’t believe a mechanistic model leaps across scales. I find it interesting that you actually see this as a general principle. Could you say a little more about how you see scale working with respect to these mechanistic models.

    • “I see Jeremy has gotten fresh recruits in this longstanding discussion on this topic!”

      I like the image of guest posters as fresh troops being thrown into a battle. That analogy works better than you know, since I’m also trying to get Karen Abbott to post on something modeling-related.

      “is lotka volterra mechanistic?”

      I’d say no. Or maybe better, not very. Ok, it’s not just a statistical model–it’s framed in terms of state variables and the processes (births and deaths) that change the values of those state variables. So if that’s what you mean by “mechanistic”, then I guess it is. But I think it’s more helpful to think of L-V models as phenomenological summaries of, or approximations to, some underlying biology. That’s how A J Lotka thought of them. So for instance, the competition coefficient for the competitive effect of species j on i just means “there is some underlying mechanism that causes the per-capita growth rate of species i to decline by this amount when an individual of species j is added to the system.”

      That’s consistent with how most population ecologists think of, say, density dependence (the simplest example of which is just a single-species L-V model, i.e. the logistic equation). Density dependence is a “dynamical mechanism” (to use Ed McCauley’s phrase) that can arise from lots of different underlying biological mechanisms.

      There’s obviously a mechanistic continuum here, of course. One person’s mechanistic parameter is another person’s phenomenological summary.

    • Brian, I agree, the first step should be to properly define what “mechanistic” means, otherwise any further examination of the virtues of mechanisms is doomed.

      I was just about to expand on this question when Jeremy’s link to Amy Hurford’s blog made me realize I had already responded to the same issue there. Surprisingly, I still agree with myself four years later (I’m not claiming this shows anything but stubbornness on my part), so ceterum censeo https://theartofmodelling.wordpress.com/2012/02/19/mechanistic-models-what-is-the-value-of-understanding/#comment-127

      • I guess Jeremy, Florian and I are all the same page that the line between mechanistic and phenomenological is blurry, or perhaps better that there is a continuum with those two endpoints.

        So for me the question is how happy are people with phenomenological models. Do we want to make the same strong claim that “Trying to understand ecological data without phenomenological models is a waste of time”? If so, I would have to disagree. If not, then what to do with the fact that there is not a sharp demarcation in the qualifier mechanistic (indeed I would say it is partly in the eye of the beholder) but we are making a sharp demarcation of importance/utility of the resulting models?

      • @Florian & Brian:

        Re: the continuum from phenomenological to mechanistic models, it’s definitely a continuum. Recall Simon Wood’s notion of “partially specified” or “semiparametric” models (well, it’s not just his notion, but he’s their leading proponent in ecology). You write down a mechanistic submodel for those bits of the system that you know well enough to do so. And you estimate from the data one or more flexible nonparametric functions (splines or whatever) to stand in for the remainder of the model.

        See Wood 2001 Ecol Monogr (not his most recent paper on the topic, but a useful entry point into the literature): http://onlinelibrary.wiley.com/doi/10.1890/0012-9615%282001%29071%5B0001:PSEM%5D2.0.CO;2/abstract

    • I am repeating myself here, but I don’t think Brian’s question, what counts as “mechanistic”? is really answerable. I enjoyed reading the discussion about it, though, so I agree that it’s worth thinking about. For me, mechanistic models are minimally nonlinear and dynamic, but there are some nonlinear time series models out there that seem very non-mechanistic, so the word “minimally” is key. But as to whether L-V predator-prey is truly mechanistic, I’m at a loss for a useful answer. There are times when I’ve used SEIR models to explain data that were previously only analyzed using logistic regression, and I thought that the SEIR was mechanistic. But my student Libby Eakin made an agent-based model (http://tinyurl.com/EakinEtAl2015) that makes our stochastic SEIR models look like a chi-square test. T

      Meanwhile, Brian, it may well be true that you could find a model that did not extrapolate across scales, but that I would consider mechanistic. That said, for me the goal of mechanistic modeling is to extrapolate across scales. When you ask me to say more, I suspect what you are asking is, what constitutes extrapolation across scales? Some might argue that if we are not explaining phenomena at the scale of at least weeks and meters in terms of quantum orbitals, we’re not done. I suppose that’s fair, but for me any effort to use a model to extrapolate from individuals to populations, or from populations to communities, or communities to ecosystems, constitutes extrapolation across scales. Likewise for spatial and temporal scales, but my criteria are more vague. I am interested in extrapolations from square centimeters (I’m a terrestrial ecologist, sorry) to square meters, square meters to square kilometers, or from days to months, and from months to years. Why I prefer that kind of thing is maybe leads to a not too interesting discussion about psychology and philosophy, but does it answer your question?

      • Greg – fair enough. I agree mechanistic is hard to define and will vary a bit from person to person.

        But then, what do I make of the title “Trying to understand ecological data without mechanistic models is a waste of time”?

        From what I’ve gathered so far it should say “Trying to understand ecological data without dynamic, nonlinear models is a waste of time”?

        Its certainly less punch, but leaving that aside, I think it gets tricky to define. For example optimization models are not dynamical (all the dynamics are left implicitly to evolution), yet I suspect many readers would support those.

        I’m really not trying to be difficult. I’m just trying to figure out how to operationalize the title, or the heuristic of what kinds of models one needs to do ecology well.

        I think I’ve heard so far not a regression. I might argue against this as there are some pretty successful areas of allometry that rely just on regressions, but I do take your point with regression. Are there other commonly used models you would rule out?

      • On the other point, thanks. I think we are on the same page of what scaling up means and why it would be nice to be able to do.

        I find it interesting that you think this is a commonly shared goal across mechanistic modelers. You could well be right, but I’ve just never head it stated so directly. I think modellers are not much different from field ecologists in having been rather scale vague until quite recently. Regardless of the outcomes, I think getting scale explicit is a good thing!

        I’m on record that nice as scaling up would be to do, I’m not convinced its so easy. But no need to rehash that here.

      • Hey Brian, I’m a little concerned that my tone was dismissive, which was my intent, so apologies for that!

        But yes, strictly speaking, I would say that it would not be unreasonable to change my title to “Trying to understand ecological data with dynamic, nonlinear models is a waste of time”. Your version is sufficiently clunky that I like my version better, but your version is certainly more accurate. More broadly, though, the reason why I’m not so concerned about what constitutes mechanistic, and why I’m not so concerned about my title, is that I think that the situation is dire. That is, my belief is that the use of dynamic, nonlinear models is so unusual among empirical ecologists, that the issue of what constitutes a mechanistic model is minor. I’m not worried about convincing you, or Andrew, or Rafael, or Florian, I’m concerned about the researchers who publish in Ecology, and who do not reference theory. That is, when I say that the situation is dire, I am thinking of Sam Scheiner’s dirge about the use of theory: http://tinyurl.com/ScheinerEcolLett2013.

        As may be obvious, I don’t have statistics to back up my claim that “mechanistic” models aren’t used in the way that I might wish. Indeed, Katia Koelle’s point in the original Ignite session (Jeremy’s summary: http://tinyurl.com/IgniteSession2014) was roughly that, although there is lots of modeling, there is not much theory. I may therefore be wrong, and God willing someday I’ll actually check. But wouldn’t you agree that mechanistic models are very rarely used to understand ecological data?

    • Brian, I should be careful about the claims I make about who believes scaling up is a key goal! Clearly it’s not everyone’s goal, but I have spent a lot of time around applied mathematicians, only some of whom were ecologists. My belief has been that the applied mathematicians I know are a good sample of the larger population, but honestly their Si-Levin Numbers (like an Erdos number, https://en.wikipedia.org/wiki/Erd%C5%91s_number, except that I just made it up) are probably almost all less than 3.

      Meanwhile, Jeremy alerted me to your previous post about how scaling up is hard to do (http://tinyurl.com/ScalingUpIsHard, Jeremy’s response: http://tinyurl.com/FoxBackAtU, both are brilliant) , which I really loved. As you point out, scaling up is expensive, to which my rejoinder is that it is not as expensive as the damage that forest insects cause to forests. Whaddya think?

      More substantively, in your post you talk about scaling up as a process of averaging, and you point out the problems with such an approach, all of which is very nicely laid out. When I refer to scaling up, however, I mean something that is perhaps specific to population ecology, so we may be talking past each other. As an example of what I mean, we measure disease transmission using 25 insects on a square meter of foliage, we plug the resulting transmission estimate into an SEIR model, and the model can successfully predict virus epidemics in forests of 1-10 hectares in size. It’s not entirely obvious to me that such an approach is going to help in macroecology, but I’d love to hear what you think, with the proviso that I can appreciate that you might need to move on for now!

      • Just as a late comment to this discussion – maybe (weak) emergence is a safer word than scaling up, because “scaling up” can be understood as a purely statistical concept as well.

        I would side with Greg in that emergence is a key, maybe THE key property of a mechanism, at least if we adopt the classical reductionist view of what mechanism / scientific explanation means (and don’t think we have any serious alternative to this view in science). Maybe the emergent element doesn’t always need to be dynamic, it can be implicit (as in some physiological niche models), but I’d be hard pressed to find something that I call mechanistic and that doesn’t have this aspects of creating a pattern from smaller parts (sensu reductionism).

    • That post is really nice, but my favorite line is this one: ” If my statistical model didn’t do so well in the new setting, I might not have much to go on if I wanted to try and figure out why.” If you’re a field biologist, fixing a mechanistic model is often quite easy.

  3. Mechanistic models are just descriptions. They must be tested. But it is as rare “a good hair day for Donald Trump” to have papers explicitly state how a model could be tested. Responses when I have questioned this have been “I leave that to the field biologists” (Nick Barton personal communication) or “I never got to that” (Dwyer personal communication). Most field biology is based on testable hypotheses that progresses science. Mechanistic descriptions that are not testable do not.

    • Judy, I’m a little puzzled by your comment. Are you suggesting that non-mechanistic models are more testable than mechanistic ones in some sense? Are you complaining about division of labor, and arguing that anyone who develops a model ought to also test it? (and if so, does that mean that those who test models ought also to develop them?) Are you questioning whether Greg’s own work involves untestable models? Because his post sure sounds to me like it describes some successful tests of models.

      More broadly, without wanting to question the value of testing hypotheses, there are more ways than that to learn how nature works. For instance, Tony Ives is no one’s idea of a pie in the sky theoretician. All of his work sets out–very successfully–to figure out why specific natural systems behave as they do. And yet Tony himself suggests that testing assumptions of one’s models is usually much more valuable than testing predictions, because if the assumptions are correct (or sufficiently close to correct), then it’s necessarily the case that the predictions will be correct too. See here:

      ESA Monday review: Tony Ives rocks (UPDATED)

      I hope you’ll take the time to clarify your comments, because I’m afraid I don’t follow your argument at all and so I worry that I’ve badly misunderstood (in which case apologies).

      • In the Krebs and Myers comment on my post, there are 3 points that I want to respond to. First, Krebs (given the use of first-person singular, it seems likely that Krebs wrote the first part and Myers wrote the second part) makes a comment that I would translate as, “Simple data on population cycles are not sufficiently informative to allow us to choose between competing models of small mammal outbreaks.” I agree that it is often true that data on population cycles are often insufficient to say much about mechanistic models (Alison Hunter and I wrote a paper that makes this point, although sadly in a journal that later died: http://tinyurl.com/HunterDwyer99). From what I have seen of Krebs’s data, however, I suspect that the data are sufficiently informative to choose between quite complex models.

        I further suspect that what we are seeing here is the effects of generational change, as described by Thomas Kuhn, on a scientific revolution. That is, it seems to me that Krebs and Myers are reacting to how mechanistic models were used more than 20 years ago. Currently, most modelers I know use data to choose between models, usually formally, while keeping in mind that any model is an approximation to the truth. Given that we know that nature is quite complex, the interesting question is then, how complex a model will our data support? I would therefore argue that recent increases in computing power are producing what Kuhn might identify as a scientific revolution, even if the current revolution is not quite as rapid or as dramatic as those identified by Kuhn. Otherwise, Krebs’s comment that “Scientific progress and understanding can only be made…” is arguably the justification for the work in my lab, and so I think he is arguing in favor of the kind of quantitative ecology that we are already doing.

        As for Myers’s negative comments on my work, most of what she complains about is countered in the Supplemental Information section of Elderd et al. (http://tinyurl.com/ElderdEtAl2013). The phrase “hidden in supplemental information” therefore strikes me as an oxymoron. I am sympathetic to the idea that the long, sprawling papers of yesteryear are a sad loss, but I am beginning to believe that it is a good thing that supplemental information files make it easier to produce shorter papers. It is perhaps worth adding that recent noisiness in gypsy moth defoliation data can be reproduced by a wide range of stochastic insect-outbreak models that include generalist predators, although personally I suspect that the introduction of the fungal pathogen Entomophaga maimaiga is also part of the explanation.

      • Amusing hypothesis but wrong. The work of women is frequently considered to be that of men.

        “Any model is an approximation of the truth”. How do we know if that is the case or if the model has nothing to do with the truth? Back to how do we test models?

        Glad we agree on the supplemental material. I couldn’t find the answers I wanted there. Why 46% oaks and 15% pines in separate blocks?

        I agree that Entomophaga maimaiga is undoubtedly totally changing the dynamics of gypsy moth populations and this makes it difficult to know what dynamics to test models against. Thanks to Anne Hajek and colleagues for continuing to collect data on this important pathogen.

        And finally a search for understanding should not be confused as being negative comments. If we can’t question we will need to give up in science.

  4. Greg

    Thanks for all the comments. I agree that ecology has way too much data collection and analysis that is theory free. I think I might personally choose to emphasize hypothesis-free (except trivial statistical null hypotheses of X will affect Y) rather than mechanistic-model-free (or any specific mode of generating strong informative hypotheses). But I think we’re definitely in the same general frame of mind. And theory sensu latu is definitely woven through what is missing.

    I guess I remain not convinced that “we measure disease transmission using 25 insects on a square meter of foliage, we plug the resulting transmission estimate into an SEIR model, and the model can successfully predict virus epidemics in forests of 1-10 hectares” is a trivial or obvious thing to achieve. If it works, then hurray! A big win. But for example plant people have been talking about trying to scale leaf photosynthesis to ecosystem productivity for decades (it was the title of a book when I was in graduate school) but it has not been achieved in any substantial way. To take the insect disease example, I don’t think it is a given that the most important process is the same at the 1 m scale and the 1 hectare scale. For example, at the 1 m scale, insect-insect encounter transmission might be the key determinant. But at the 1 ha scale, tree health might be the dominant variable. In which case there is no reason to expect a model that performs well at 1 m to explain much of the variance at 1 ha.

    • We’re probably in such complete agreement that I should keep this brief. Yup, it’s kind of amazing that 1m scale processes predict hectare scale processes, I’m with you there (in fact I worry about it being luck, but actually we’ve done it in more sophisticated ways, and it looks like not). As for photosynthesis, does anyone know why photosynthesis can’t be scaled up? And does the failure of what I presume are models tell us anything? My knee-jerk reaction is maybe insects play a key role, but I’m sufficiently aware of my ignorance to have no faith in that…and if you’re too busy to answer, no worries.

    • Giles Hooker is a smart guy (not easy to figure out who it is, by the way). My concern about statisticians who don’t like mechanistic models is that they don’t know enough biology to write down a reasonable model. I think it’s easier if you are a biologist.

  5. Pingback: Hypothesis testing using field data and experiments is definitely NOT a waste of time | Ecological Rants

  6. Pingback: A Follow Up on Plotting Everything Against Everything Else | Chao's Blog

  7. Pingback: Cause, mechanism and prediction in ecology – biologyforfun

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.