The two ways to keep your mathematical model simple

When you’re building a mathematical model, for any purpose (prediction, understanding, whatever), you have to keep it simple. You can’t literally model everything, any more than you can draw a map as big as the world itself. And if you want to get analytical results, as opposed to merely simulating the model, you have to keep it relatively simple.

There are two ways to keep things simple: leave stuff out, and leave stuff out while implicitly summarizing its effects.

The first strategy is the more obvious and familiar one. If you’re modeling competition, omit predators from your model. And if you don’t care about the effects of spatial or temporal abiotic environmental variation on competitive outcomes in your model, assume that the abiotic environment is constant in time and space. Etc.

The second strategy is just as common (indeed, both strategies are unavoidable), but in my experience readers don’t always recognize when it’s been used. You leave stuff out of your model, but implicitly summarize its effects on the stuff you’re modeling.

Indeed, pretty much any bit of any mathematical model can be thought of as implicitly summarizing the effects of some unspecified process(es). For instance, predator-prey models typically assume a constant conversion efficiency: a fixed number of predators (or units of predator biomass) produced per number or unit of prey consumed. But that conversion efficiency parameter actually summarizes aΒ massive amount of unspecified underlying biology! All the digestive physiology and gut microbiota and gene expressiony…um, things and biochemical [mumble mumble] that go into turning food into predators just get black-boxed into one number.* Which is often fine! For many predators it really is the case that there’s a pretty fixed ratio of predators produced per prey consumed. Fully explaining why that ratio takes on the value it does in any particular case might require the work of many lifetimes–but who cares if all you’re trying to model is predator-prey dynamics?

In evolution, think of quantitative genetics and what Alan Grafen calls the “phenotypic gambit”. Don’t model the underlying genetic architecture of the phenotypic trait you’re modeling. Just assume the trait is the result of a normally-distributed genetic effect plus a normally-distributed environmental effect, plus a GxE interaction if you’re feeling fancy. It often works! For further examples, see this old post on how the concept of “genetic drift” (or in ecology, “demographic stochasticity”) is just a way of summarizing the consequences of lots of unspecified “low level” events, and this old post on trait-based ecology of phytoplankton.

High on the list of any theoretician’s pet peeves is when readers mix up these two strategies, thinking that something important has been omitted from the model when in fact its effects have been implicitly summarized. For instance, say you have a Tilman-esque resource competition model that obeys the R* rule. There’s no possibility of stable coexistence; whichever species has the lowest R* for the limiting resource outcompetes all the others. Now imagine that you add in intraspecific density dependence for the best competitor: make its per-capita growth rate decline as its own density increases (as well as remaining an increasing function of resource availability). The resulting model now will potentially allow stable coexistence of two species. But you have not thereby shown that two species can coexist without any niche differentiation. Because implicitly, you added in niche differentiation when you added in intraspecific density dependence. One of the species now competes intraspecifically for some unspecified reason that’s independent of competition for the shared resource. Which means that it must have some sort of “niche difference” from the other species. The effects of that unspecified niche difference are summarized by the intraspecific density dependence.

There’s a lot more that could be said here, in particular about the circumstances in which each of these two simplification strategies is best deployed. But I’ll leave that for others to say in the comments. πŸ™‚

*I am not a physiologist, microbiologist, biochemist, or developmental biologist, in case you can’t tell.

21 thoughts on “The two ways to keep your mathematical model simple

  1. Hi Jeremy; thoughtful post. thank you. there is more here. let me give an example; of course its sex ratio.
    In the 1950s a UCLA grad student RICHARD SHAW did a phd thesis on sex ratio. His selection experiments did not work, so he turned to theory. His 1953 American Naturalist paper with Mohler invented the ESS technique, and applied it to sex ratio, getting sterling results. he more or less LUMPED the genetics into a measure of quantitative contribution to the next generation [ i speak qualtitatively here]. This result is well known, and in 1982 I named the whole class of fitness results for sex ratio the ‘ SHAW-MOHLER EQUATION’.
    Less well known is the rest of his thesis, published in 58; In it he studied many detailed population genetic models for dynamics of sex ratio selection; basically he set up the genetic determination of gender [ sex] and let it rip…. did it on a hand calculator. After some iterations the population reached an equilibrium state.
    GOT a capital result: almost any autosomal specification of sex (ratio) resulted in a population equilibrium of 1:1. If the genetic coding was not autosomal, biased sex ratios, sometimes pop extinction, was the equil [ sic]. WD Hamilton famous 1967 Science paper cited Shaw’s work as the very first citation!
    so we have every reason to believe that the simple genetic models of sex specification will find the pop phenotypic equil sex ratio… that will be true for giant classes of population genetic models. This allows us to study rather complex ecological and population structure/dynamic settings and keep the genetics simple.
    BEUKEBOOM & PERRIN 2014 show that the 1:1 ESS sex ratio, independent of details of genetics, is a key to understanding evolutionary dynamics of the sex determining mechanisms themselves[ their book is EVOLUTION OF SEX DETERMINATION, OXFORD].
    Around 1990 Brian Charlesworth and I , in seperate papers, showed the approximate equivalence to the ESS / optimization approach to normalizing selection with much deeper , more fine grained, genetic specification of phenotypes. At least the average phenotypes.
    Such population genetic invariance lies at the heart of ESS theory for normalizing selection. for everything, really. Its does not always work, but a great BET.


    • In a previous draft of this post I had a paragraph in there about how the “implicitly summarize stuff you’re leaving out” approach makes sense when the implicit summary is “robust” sensu Levins. That is, many different assumptions about the stuff you’re leaving out can all lead to the same implicit summary. Your example of many different population genetic models of sex ratio evolution all leading to the same ESS summary model is a good one.

  2. Has anyone created a model which attempts to predict the rate at which the knowledge explosion will accelerate?

    I’m particularly interested in the evolving relationship between 1) the amount of power human beings have available for manipulating our environment and, 2) our ability to successfully manage whatever powers we have.

    My common sense guess is that knowledge/power would expand at a faster rate than our management ability, opening an ever wider gap between the two. But I have no data to base such a guess on, so I’d be interested in any projects which attempt to quantify that relationship.

      • Well, ok, so given the importance of the issue, we might try to make it less vague.

        For instance, could one track the number of published papers, patents, or some other metric of discovery?

        I’d agree that at the moment I have no useful suggestion on how to track our management ability.

    • “And also probably writing reviews for other modelling papers.”

      How come? Can you elaborate? Do you feel like authors and reviewers often mix up these two approaches to simplification, or use one when they should use the other?

      • I generally think they use language that exacerbates the misconception. They may know what they are doing, but I think I remember many papers that write sentences where they claim to “ignore” something they actually include extremely coarsely. And that is almost always the direction the mix-up is in for authors. For reviewers the mix up is similar, they claim I ignore something that I’ve included through a constant parameter or simple function that may not explicitly incorporate the thing they are interested in.

  3. the 3rd main way to keep your model simple is to employ dimensional analysis: basically this says that you can reduce the complexity of your problem by working in dimensionless variables/parameters. there are formal rules for doing this, and I believe you mentioned a key ecology reference [ Stephens/Dunbar] in an old post.

  4. I think one thing worth mentioning is that it is not always clear how to judge the simplicity of a model. How do we compare a model composed of X equations, Y parameters, and Z `effects modelled’ to one with A, B, and C of these things respectively? It is also not clear how to compare the complexity or simplicity of things that are assumed away when comparing models. In a predator-prey system in a very spatially heterogeneous environment, is it simpler to neglect (summarising effects or not) the spatial distribution of the populations, or the influence of, e.g., allochthonous resources, or multiplicity of inter-species interactions? To me these are quite interesting questions to explore, even if the answers will always be somewhat unsatisfactory.

    In terms of mathematical classifications, we have notions of simplicity that are relatively clear: linear equations are simpler than nonlinear, scalars are simpler than systems, ODE are simpler than PDE, Gaussian noise is simpler than non-Gaussian, etc. But even in the abstract yet simple setting of mathematics, exceptions exist, and hierarchies of complexity are heuristic at best.

  5. As for when to just leave stuff out vs. when to implicitly summarize, here’s one clear-cut rule: leave out anything that changes much slower than the stuff you’re trying to model. Implicitly summarize anything that changes much faster, by assuming that it goes to equilibrium instantly. The jargon for this is “separation of timescales”.

    For instance, no ecological model needs to include the fact that the sun is slowly growing in size and in billions of years will become a red giant that will engulf the earth. That process is far too slow to matter for any ecological purpose, so just ignore it.

    Now imagine you’re modeling the population dynamics of a predator that consumes prey with a much shorter generation time than the predator. Rather than explicitly modeling the prey, just summarize the effects of that predator-prey interaction on the predator’s dynamics as intraspecific density dependence in the predator. You do this by assuming that the prey density instantly goes to equilibrium in response to any change in predator density. This is why the bacterivorous ciliates I use in my lab can be described as growing approximately logistically–the bacteria on which they feed have much shorter generation times than they do.

    • This seperation of time scales is the trick for density dependent life history evolution theory too; we assume dN/dt~0, due to density dependence somewhere in the life history working constantly, and then we study life history optimization of Ro, the appropriate fitness measure in non growing pops. The trick may fail for some forms of Density dependence, but it forces us to consider where DD is present, and how it works.

      • Hi Matthew; thanx; I dont know Steve’s work very well. I talked with him once many yrs ago, and i know he finds my pop dynamic assms pretty simple[ simplistic?]
        returning to my first comment above;
        There is some work on sex ratio adaptation to complicated population dynamics; work of Jon Seger [ 93 Nature] and John Werren [ indeed werren’s phd thesis, about 1980; our 1978 nature paper started it all].. this work exploits the invariance of the genetic determination and focuses on ESS responses to pop fluctuations.
        One of the more inteersting findings was that pop fluctuations per se did not affect the ESS sex ratio; BUT a plastic sex ratio response that could be properly timed was selected for. Both these chaps are on Google Scholar.
        One of mine that may interest you is here:

      • Ed McCauley here at Calgary has been doing interesting, ambitious work (as yet unpublished, I think?) on the feedbacks between Daphnia life history evolution (e.g., allocation to resting eggs) and nonlinear, non-equilibrium population dynamics. Involves parameterizing models of individual allocation to reproduction, growth, and survival from clever life history assays and population dynamic data.

  6. “that conversion efficiency parameter actually summarizes a massive amount of unspecified underlying biology!”

    We’ll turn you into a macroecologist yet! πŸ™‚

    So basically I am agreeing with you. But “any bit of any mathematical model can be thought of as implicitly summarizing the effects of some unspecified process(es)” – yes, but the converse is NOT true. Not any unspecified process can be found in a model.

    Take the Rosenzweig MacArthur predator-prey model since you did. Yes the conversion efficiency and the handling time neatly captures (in a linear first order approximation) lots of biology (digestion and anabolism in the former and basically behavior sensu latu in the latter) and that is very effective. And you can even start to find correlates of those parameters (conversion efficiencies depend on prey type – insects are less efficient to digest than nuts) (ooh hey I’m starting to send like a macroecologist again – who needs detailed models when you can slap down a phenomenological curve with a few parameters). But other dimensions are nowhere captured in a simplified model. Nothing in RM captures competition or spatial structure. So it is important to note what is being captured (and what biological processes those roughly map to) and what cannot realistically be swept into parameter.

    I have to say its not just readers who get confused. I’ve seen more than one theoretician make overly sweeping statements about how much they can say is incorporated in their model. It often sounds very technical like “well this is just a linearization around the equilibrium so it should apply generally” – true on one level, but that is not an appropriate answer to “is competition included”.

    • “but the converse is NOT true. Not any unspecified process can be found in a model.”

      Of course. That’s why the post says that. πŸ˜‰

      Ok, my non-flip response is to ask you (and any other interested readers) how common they think it is for theoreticians to make overly-sweeping statements about what processes or factors their models implicitly incorporate. I agree with you that “this model is a linearization around equilibrium, so it can be thought of as an approximation to any nonlinear model” isn’t very convincing unless it’s shown that it’s a *sufficiently good* approximation. But I dunno, do theoreticians these days often make that argument? Honest question (coincidentally, I just saw someone make that argument…)

      And I guess you sometimes see theoreticians just stick a diffusion term on a model and thereby claim to have captured demographic and environmental stochasticity, or even just unspecified “complexities”. Karen Abbott for one doesn’t like this; she’s been arguing that ecological theoreticians need to be more thoughtful and sophisticated about how they incorporate stochasticity into their models (

      I suspect (correct me if I’m wrong) that part of what you’re bothered by is theoreticians claiming that a simple model that omits X nevertheless provides insight or intuition about systems in which X is at work? Not that a model that omits X provides insights or intuition *about the effects of X*, of course, but that a model that omits X provides insights that remain valid despite the operation of X. My sense is that this is the biggest gap between theoreticians and non-theoreticians. Theoreticians believe that intuition and insight about complex reality comes from studying simple limiting cases that omit lots of the complexities. Non-theoreticians don’t. Old post on this:

      • No I think its more your next to last paragraph. I get that a simplified model of predator prey can be useful. We sometimes disagree on this blog about how useful. And that depends on ones goals. But its certainly a legit approach.

        I guess what bugs me is when theoreticians don’t acknowledge that it is a simplified model by instead claiming it effectively includes everything by arguments like linearization or a stochastic term.

        And I’m sure that they make such claims in part because they deal with a lot of field people that say well if it doesn’t include X its not a good model. Which is wrong on the part of the field people. But theoreticians should get better at pointing out why that argument is wrong, not just claim that it is somehow accounted for in their simplified model and move on.

        As to to how often theoreticians do it, I don’t know. But it seems like every presentation of a model to a general audience gets a question like “but doesn it include X”. And “yes in a very general way” is one of the 3 or 4 main responding gambits. But maybe I’m way off base.

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s