Have ecologists ever successfully explained deviations from a baseline “null” model?

In an old post I talked about how the falsehood of our models often is a feature, not a bug. One of the many potential uses of false models is as a baseline. You compare the observed data to data predicted by a baseline model that incorporates some factors, processes, or effects known or thought to be important. Any differences imply that there’s something going on in the observed data that isn’t included in your baseline model.* You can then set out to explain those differences.

Ecologists often recommend this approach to one another. For instance (and this is just the first example that occurred to me off the top of my head), one of the arguments for metabolic theory (Brown 2004) is that it provides a baseline model of how metabolic rates and other key parameters scale with body size:

The residual variation can then be measured as departures from these predictions, and the magnitude and direction of these deviations may provide clues to their causes.

Other examples abound. One of the original arguments for mid-domain effect models was as a baseline model of the distribution of species richness within bounded domains. Only patterns of species richness that differ from those predicted by a mid-domain effect “null” model require any ecological explanation in terms of environmental gradients, or so it was argued. The same argument has been made for neutral theory–we should use neutral theory predictions as a baseline, and focus on explaining any observed deviations from those predictions. Same for MaxEnt. I’m sure many other examples could be given (please share yours in the comments!)

This approach often gets proposed as a sophisticated improvement on treating baseline models like statistical null hypotheses that the data will either reject or fail to reject. Don’t just set out to reject the null hypothesis, it’s said. Instead, use the “null” model as a baseline and explain deviations of the observed data from that baseline.

Which sounds great in theory. But here’s my question: how often do ecologists actually do this in practice? Not merely document deviations of observed data from the predictions of some baseline model (many ecologists have done that), but then go on to explain them? Put another way, when have deviations of observed data from a baseline model ever served as a useful basis for further theoretical and empirical work in ecology? When have they ever given future theoreticians and empiricists a useful “target to shoot at”?

Off the top of my head, I can think of only a few examples. And tellingly to my mind, in most (not all) of those examples the baseline models were very problem-specific. For instance, there’s Gary Harrison’s (1995) wonderful use of a nested series of baseline models to explain Leo Luckinbill’s classic predator-prey cycles dataset. The simplest baseline model explains certain features of the data, a second baseline model is then introduced to explain additional features, and so on (with additional validation steps along the way to avoid overfitting). Or think of the many “random draws” experiments on plant diversity and total plant biomass that use the Loreau-Hector (2001) null model as a baseline in order to subtract out sampling effects, going on to partition the deviations from the null model into effects of “complementarity” and “selection”. And in the context of allometric scaling, I’m sure there’s work on why particular species or phylogenetic groups deviate as they do from baseline allometric relationships (e.g., higher vertebrates have larger brains for their body size than lower vertebrates).

But in most cases I can think of in ecology where someone’s proposed some “generic” null model like MaxEnt or neutral theory, or some null model based on constrained randomization of the observed data, it hasn’t turned out to be very productive to try to explain deviations of the observed data from the null model. All we usually end up with is a list of cases in which the data either do or don’t match the null model, with no obvious rhyme or reason to the occurrence, size, or direction of those deviations. See, e.g., Xiao et al. 2015 for MaxEnt models of species-abundance and species-size distributions. In general, deviations of observed data from the predictions of some generic “null” model do not seem to be a very good source of stylized facts.

Assuming for the sake of argument that I’m right about this, why is that? I honestly don’t know, but I can think of a few possibilities:

  • Our baseline models don’t correctly capture all and only the effects of the processes or factors they purport to capture. So that deviations of the observed data from the baseline models aren’t interpretable. I think that’s usually what’s going on in cases where the baseline model is some constrained randomization of the observed data.
  • Multicausality. It’s hard to build a baseline model that subtracts out all and only the effects of processes A & B if processes A & B are far from the only ones that matter. Indeed, insofar as many processes matter we should expect any patterns in our data to take the form of “statistical attractors” that exist independent of the nature and details of those processes. More subtly, even if there are just one or two dominant processes that we can capture with a baseline model, the deviations of the observed data from the baseline model are going to be hard to interpret unless they too are dominated by one or two processes.

If I’m right, then I think ecologists shouldn’t be so quick to to recommend the approach of developing a baseline model and then explaining deviations of the data from it. And reviewers and readers should probably default to skepticism of this approach.

p.s. Nothing in this post is an argument against deliberately-simplified models that omit some processes, factors, or effects in order to focus on others. I’m just arguing that we should default to skepticism of one particular use of deliberately-simplified models. They have other uses (e.g.).

*Note that lack of differences don’t imply that your baseline model is actually correct, or even a good approximation to the correct model. But leave that (common) scenario aside for purposes of this post.

20 thoughts on “Have ecologists ever successfully explained deviations from a baseline “null” model?

  1. Thinking about it further, I bet folks modeling range shifts under climate change have done this. E.g., built a baseline model just based on knowledge of a focal species’ physiological tolerance limits, and then explained observed deviations from the model via interspecific interactions.

    And of course, if the model is purely statistical, ecologists do this all the time. If you have a regression model, and you add a second predictor variable that substantially improves the model, you can think of the original regression as the “baseline” model and the additional predictor as explaining residual variation that was unexplained by the baseline model.

  2. The literature on Kleiber’s Law (allometric scaling of basal metabolic rate with body mass) has several examples in which deviations (residuals) from the general pattern have ecological interpretations, based on diet, social structure. Ditto on the cost of transport, relating residuals to the mechanism of transport.

    • Thanks Tom. I don’t know that literature well, but it’s my impression that it’s one of the best examples in ecology of the “explain deviations from a baseline model” approach.

      • brian mcnab { http://scholar.google.com/citations?user=FLqfWx8AAAAJ&hl=en}
        has done this with a great many variables of ecological interest, particularly for birds and mammals.. Indeed, for decades the standard inferential practice in comparative physiology, life-histories, habitat coupling for energy, etc has been to first remove the effects of body size [ usually by fitting a power function, an allometry] and then look for what the residuals correlate with. Sometimes the removal of body size is VERY well motivated [ read: theory, like the Brown quote] and sometimes we just have a gut feeling, or empirical experience, that its appropriate. For example, primates have very long lifespans, compared to other mammals, FOR THEIR ADULT BODY SIZE. Humans have big brains , for their body size, compared with mammals in general. irish elk males have big antlers. and so forth.
        The approach has led to several general advances: I will mention two, 1] build a theory for the allometry/power-function itself, using formal dimensional analysis as a guide [ ie, power functions often imply that something is held constant for all body sizes, and we ought to guess it] or 2] the life history observation that the residuals around SEVERAL LH allometries correlate with each other, and their pattern of correlation is a strong hint as to what is involved in setting the allometry [ ies] in the first place.

        Many more things to discuss here; see Mcnab’s book , or my 1993 book: Life History Invariants.
        ric charnov

  3. I tried this once, though maybe not “successfully”. There is a lot of arm-waving (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0058241). But others in the savanna ecology community did similar work (e.g. Moncrieff et al. in Ecology, http://onlinelibrary.wiley.com/doi/10.1890/11-0230.1/abstract). The idea is that other selective forces, like fire and browsing, push savanna trees outside the allometric bounds predicted by metabolic scaling theory. I think these examples kind of worked because the dominant forces are pretty well known in savannas, perhaps side-stepping the second issue (multicausality) you proposed. I don’t know of people trying to include these forces in MST to then make new, testable predictions, though.

  4. Thinking about it further, mathematical epidemiology provides a few examples. The ones I know are like Harrison (1995). We have say, some long-term time series data on measles cases, and a simple SEIR model parameterized from a combination of those data and independent sources of information captures key features of the time series. But it doesn’t explain, or misdescribes, other features of the data, or can’t explain/misdescribes other datasets (e.g., regarding the age structure of disease incidence). But a more biologically-elaborate model like an age-structured model or a metapopulation model can explain those other features of the data, or those other datasets, in addition to still explaining the things the original SEIR model explains.

  5. I mainly see hand-waving in community ecology. Very few serious attempts to explain deviations from the null.

    I have a paper that used a very simple combinatoric “null” that explained most of what was going on with a particular beta-diversity metric. Lots of people had proposed biologically-based explanations for variation in this metric, but it turned out that counting species’ occurrences explained most of it. The Appendix then used Taylor expansion to predict the exact discrepancy between observed and predicted values.

    I’m not sure if this counts, though: I’d argue that any “ecological” pattern that can be explained perfectly with a 2nd order Taylor expansion can’t be very biologically interesting.

    link, for anyone that’s interested: https://www.jstor.org/stable/10.1086/658990?seq=1#page_scan_tab_contents

    • Taylor expansion is a good example of the sort of thing I have in mind in this post. Perhaps it’s telling that Taylor expansion rarely gets used in ecology.

  6. Most excellent post, Jeremy! There is a lot of red meat to chew on here. I love the conversation though, because the points you have raised I believe get to the very heart of why theoretical works in ecology have yet to approach the precision and predictive nature of those we see in chemistry and physics, for example.

    1. “But in most cases I can think of in ecology where someone’s proposed some “generic” null model like MaxEnt or neutral theory, or some null model based on constrained randomization of the observed data, it hasn’t turned out to be very productive to try to explain deviations of the observed data from the null model.”

    I agree with this statement completely. We do not have many good examples in ecology, and I believe that has much to do with how we have built models previously. Although there is a shift happening now, historically models have been based upon the variables we toss into the PCA hopper. Churn the butter and then parse out the explained variation of your chosen variables, and shazaam- you have a model. No, very likely you don’t. And the reason it does not work is because you have likely left out those variables that really have the most to do with the processes you attempt to explain.

    2. “Our baseline models don’t correctly capture all and only the effects of the processes or factors they purport to capture. So that deviations of the observed data from the baseline models aren’t interpretable.”

    Bingo, Jeremy! See my response in (1) above.

    3. “Multicausality. It’s hard to build a baseline model that subtracts out all and only the effects of processes A & B if processes A & B are far from the only ones that matter.”

    Absolutely, Jeremy, you are right at the crux of the problem that ecologists rarely overcome when it comes to constructing models. I think there are three primary reasons for this difficulty in ecology: a) Ecologists are not trained as, and therefore do not think like physicists and chemists. Education in the disciplines of chemistry and physics are virtually 100% model-based. At least from my own educational experiences, spanning many decades, ecology comes nowhere near this kind of model-based educational efficiency. b) Ecologists are not schooled in the nuances of constructing null models. As you say, most ecologists set out to reject the null. I can say with great confidence that this mindset is a yuge, yuge stumbling block when it comes not only to constructing a meaningful null hypothesis, but even more so a meaningful null model. c) Ecologists are not educated in the traditions of model construction used in disciplines like chemistry and physics. Overwhelmingly, their models hinge upon thermodynamics, and classical and quantum mechanics. They probe new processes using an established set of null models, and ecology has essentially none. It is, quite frankly, a whole other way of thinking compared to the manner in which ecologists are educated. Ecologists tend to deal with decision-making trees, i.e., ones and zeros. It is, or it isn’t.

    In my own work I have found it rather difficult to conceive of a useful null model in ecology. I fell down that rabbit hole years ago, and feel like *maybe* I am getting close now. It is really very difficult. In my case, I’ve been chasing down a variety of issues related to variation at the community level. What I ultimately needed was a null model devoid of spatial autocorrelation and based upon observational data. Sheesh… I think I might have it, but, I am told I still have one major flaw in the model. In essence, while it appears I have constructed a null model that is free of autocorrelation, I may have violated a rule in getting there. And so I am compelled to believe I have left something out, and I am struggling to figure out what it might be.

    That’s the hard part, because you know, imagine where Newton would have gone if he omitted friction from the processes of mass, force and momentum in trying to explain the laws of motion…

    This schtuff ain’t easy.

    • Thanks John!

      Since I’ve got you here, and since it seems like you mostly agreed with the post, I want to use your comments as a jumping-off point to ask you a few questions. Let me start with this one: We’ve talked here in the past about how ecology seems to be moving away from general theory and towards system- and case-specific models that can be more tightly linked to data (https://dynamicecology.wordpress.com/2014/09/22/theory-vs-models-in-ecology/). I think of MaxEnt as perhaps the most prominent exception to this trend. Do you think that’s right? Are ecologists actually getting better at building, evaluating, and learning from models, but at the cost (if it is a cost) of narrowly tailoring our models to specific systems or cases?

      • I would agree ecology has shifted toward modeling and away from theory generally. I believe it was a necessary shift because if we do not understand the processes that underlie our observations, then I think we have built a cage with nothing inside of it. That is, we end up with a sound peripheral explanation but no understanding of what created that observed periphery.

        In other words, we have the ability to make observations and often we see similar patterns across divergent systems, but not always. Consider, for example, that temperate rainforests are much less diverse than tropical ones, but much more productive concerning biomass/ turnover. We can model these trends fairly easily by simply measuring biodiversity and productivity. But then what?

        Do we derive a theory of biodiversity based on temperature regimes, and then a production theory based upon organic soil layers? Or, is that evolution was constrained by some number of factors in the temperate rainforests that culminated in lowered biodiversity, and perhaps having nothing to do with temperature? Are temperate rainforests more productive not because of the enriched organic soil layer compared to the tropics, but because of unknown mycorrhizal associations enhancing nutrient uptake? You can see where this is going…

        And so I believe another reason for the shift away from theory has a lot to do with a fear of the unknown, induced by the exponentially-growing known. In other words, back in the day of MacArthur, we had more theory and less modeling because there were fewer scientists, simpler technology and a lot less data. Fast forward to the present and my goodness, the shear amount of data at our disposal is stunning by comparison. Thus, there is a much greater chance a theory will be debunked today than 50 years ago, and that it will be debunked much faster compared to fifty years ago. Thus, modeling specific phenomena and systems is a much safer bet nowadays.

        From your prior post: “And Morgan Ernest has expressed mixed feelings about how we’re becoming more rigorous but less creative, better at answering questions but less good at identifying questions worth answering.”

        I could not agree any more with that statement. We need the models, so yes, by all means keep on modeling. But I would say that every ecologist should force themselves to strive for elucidating theoretical implications in everything they do. Applied work is very good, and we need as much of it as we can get, but we also need ecologists to fit these outcomes into much broader contexts. And why not?

        So what if it turns out you are wrong… most of us are most of the time anyway.

      • I had to think a bit about the “cost” if any, of narrowly-tailored models. My thoughts took me way back to my undergraduate days of some 30+ years ago. I had an advisor (an ecologist) who frequently said to me that there is always a temptation in science to make things far more complicated than they really are. That humans have a natural tendency to introduce complexity where none in reality exists. He said to me again and again, “the simplest answer is not only the best answer, it is the right answer”.

        It is also the most difficult answer to find. So yes, I would say there is a cost of having a great many system-specific models. That cost is deriving what is ultimately the simplest, best and therefore “right” answer.

  7. John Harte doesn’t call them null models but his MaxEnt approach is rather explicitly based on a minimum information model and identifying places where the model fails and trying to understand these failures. I respect that he has been very consistent in applying this approach to his work. A lot of people say model failures are the most interesting, but John actually lives it.

    Also, MaxEnt is arguably a much more productive minimal information model than various randomizations for a number of reasons.

    • I agree Brian. I think baseline models have various desirable features, and “being able to explain the deviations of observed data from them” is only one of those features. And I agree that it remains to be seen if explaining deviations from MaxEnt models will eventually prove to be fruitful, even though it hasn’t yet.

  8. What about the null models that are trying to see whether phylogenetic diversity measure of an area are not just another extension of taxonomic diversity as these diversity measures are highly correlated (depending on what metrics that we are using)? I think this simple approach to eliminate species richness effect from phylogenetic diversity measure works quite well. There are no baseline model for this topic anyway; people are still exploring things.

  9. [Long-time reader, first-time commenter :)]
    I’m a bit late to the party here, but I’ve done this! The null model was predicting species richness based on random draws of plants from a larger pool. Deviations at the scale of year-to-year are related to temperature, and at the scale of plot-to-plot are related to plant density. Of course, there’s some hand-waving thrown in as well – at the site-to-site scale, n=1 so who knows what’s going on, and the usual correlation causation.

    The paper is here: http://onlinelibrary.wiley.com/doi/10.1111/j.1461-0248.2005.00855.x/full

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s