In a variable world, are averages just epiphenomena? (UPDATED)

The value of any variable we measure likely is affected by, or correlated with, lots of others. And the effect of any variable on another likely depends on the values of other variables.

I tell my intro biostats students that this is why we care about averages. In a noisy world about which we never have perfect information, it’s mostly unhelpful to think in terms of certainties, because there aren’t any. But we can still think about what the world is like on average. That’s tremendously useful. For instance, a medication might not completely cure a disease in every patient–but it’s still tremendously useful to ask whether it improves patient health on average. On this view, variability around the average is of secondary interest to the average itself.

But there’s an alternative view. In a variable world, averages mostly are just meaningless, unimportant epiphenomena. Andrew Gelman articulates this well in a recent post:

Some trends go up, some go down. Is the average trend positive or negative? Who cares? The average trend is a mixture of + and – trends, and whether the avg is + or – for any given year depends…The key is to escape from the trap of trying to estimate a single parameter

On this view, variability is primary and averaging across some or all sources of variability is of at best secondary interest, or even harmful.

This rather abstract-sounding philosophical debate comes up often in everyday practice in ecology and evolution. Most obviously, it comes up in debates over how to interpret main effects in ANOVA-type models in which there are significant interaction terms. But the same issue comes up outside of purely statistical contexts.

For instance, think of the debate over the interpretation of interspecific allometric scaling exponents. When you plot, say, metabolic rate vs. body size for a bunch of species, a power law relationship with an exponent of 0.75 explains a lot of the variation. Individual species and clades deviate from this average relationship, of course, but that’s the average. One school of thought sees this as a hugely significant biological fact (e.g., Brown et al. 2004). We can develop alternative models to try to explain this average exponent. And we can use the average metabolic rate-body size allometry as a baseline and try to explain why particular species or clades deviate from it. An opposing school of thought notes that different clades deviate from the average allometry in different ways and concludes that the average allometry is a meaningless epiphenomenon (e.g., Reich et al. 2006). There is no “universal” metabolic rate-body size allometry. Rather, the clade-specific allometries are real, and different from one another. It’s those clade-specific allometries we should seek to explain and predict. Presumably with clade-specific explanations that don’t involve deviations from some purportedly-universal baseline.

As another example, a lot of debate about the “units” or “levels” of selection in evolution comes down to the interpretation of the average fitnesses of different entities (see Charles Goodnight’s entire blog). As a third example, one of the arguments for doing macroecology is that a lot of uninterpretable, idiosyncratic variability washes out at large scales, revealing average behavior that’s worth studying.* On the other hand, such averages arguably are highly uninformative about the underlying biological processes, and so arguably aren’t very helpful, at least not if your goal is to learn something about biology. Outside of ecology and evolution, think of the debate in psychology over whether g (“general intelligence”) is a real and important human trait, or a meaningless statistical artifact. And I’m sure you can think of other examples. Indeed, there are whole fields that have a methodological commitment to focusing on variability, while others have the opposite commitment (think evolutionary biology vs. developmental biology).

For each of the examples mentioned above, there’s a big literature on the pros and cons of taking some average as primary, vs. taking that same average as an unimportant epiphenomenon. But is there anything that can be said in general about when the former approach or the latter approach makes more sense as a research strategy?**

For instance, in ecology and evolution a lot of valuable theoretical and empirical research on allometric relationships has come out of the school of thought that sees average allometries as important, universal biological phenomena. Even if much of that work turns out to be incorrect, I suspect that the hypothesis of a universal, meaningful allometric exponent was the more fruitful working hypothesis. That is, I suspect we wouldn’t have learned as much about organismal form and function if we’d instead gone with the working hypothesis that variation is primary. But on the other hand, I’m sure you, like Andrew Gelman, can think of situations in which focusing on estimating and explaining the “true” average value of some quantity was never a promising research strategy. And I’m sure you can think of cases in which one can make significant progress either by taking the average as primary, or by taking variation around the average as primary.

Anyone know if some historian or philosopher of science has done a comparative study of debates about the interpretation of averages, trying to identify the circumstances in which “average-focused” vs. “variability-focused” research strategies are most fruitful?** If so, I’d love to read it. If not, somebody should do it.

UPDATE: You should totally check out the comment thread, it’s very good. Especially this comment from Simon Hart, reminding us that in a nonlinear, nonadditive world, it’s often essential to focus on averages (and understand how said averages are affected by the nonlinearities and nonadditivities you’re averaging across). I have a series of old posts on this in the context of modern coexistence theory; starts here.

*We have lots of old posts on this.

**Beyond the obvious point that focusing on the average probably (not always!) makes sense if there’s hardly any variation around the average.

31 thoughts on “In a variable world, are averages just epiphenomena? (UPDATED)

    • Yes, the distinction between “how does your own h-index change as you age” and “how does the average h-index of all scientists vary as a function of their age” is the sort of distinction I’m thinking of.

      The h-index example seems to be one where the distinction doesn’t matter all *that* much. The h-index for most individuals seems to change over time pretty similarly to the average h-index.

      • “The h-index for most individuals seems to change over time pretty similarly to the average h-index.”

        Isn’t that one similar to those classic newspaper headlines such as “New study shows most people are of average intelligence”? 🙂

        Seriously, I think it’s the individual variability that’s of interest here: we all have unique career trajectories.

      • “we all have unique career trajectories.”

        But not all *that* unique, I don’t think, not as measured by the h-index. But we could be getting into eye of the beholder territory here…

      • I really don’t think we have the data to say that as there are few (if any) published h-index trajectories, as I pointed out in the post.

      • Ok fair enough. But given the mathematical contraints on the possible behavior of h-index trajectories (they can’t go down), given just how much of the average h-index behavior can be predicted just by how many papers you’ve published and the average IF of the journals in which you publish, and in light of the fact that most people publish at a fairly steady rate (not, say, a dozen papers one year and then none for the next several years), I’d be surprised if individual h-index trajectories vary all that much. But I’ve been surprised before!

      • I can see why you’d say that, but I suspect publication behaviours vary more than you suppose: look at the left hand graph on the second figure on my post for instance, and certainly the IFs of the journals in which I publish vary by an order of magnitude. The data in Alex Bateman’s figure is, I think, made up; but even if it is, he’s assuming considerable variability in trajectories:

      • In reading all the comments about the h-index, I wondered if the # of pages published, v. the # of publications, might vary more over a career? It seems early on in many careers, the push is for volume of publications in order to move up the career ladder. Once established, could it be investigators publish fewer articles, but actually publish more pages as their work eveloves and becomes more complex?

      • UPDATE: I contacted Alex Bateman this morning and apparently those three h-trajectories on his post WERE from real scientists, which convinces me even more that individual h-trajectories can be much more variable than we might suppose.

  1. There’s been some interesting work thinking about whether we talk about and focus on “global warming” (on average, whole planet is getting warmer) vs. “climate change” (individual experiences will vary), mostly about public perception, though. Averages can be really important for distilling lots of complex information, but can lead to over-generalizations or misconceptions about how a system actually works.

    • Yes, good point. This issue crops up everywhere! Though in the context of public perception of climate change, I’d think (?) that the weather vs. climate distinction would loom larger than climate change differing from place to place.

  2. This is a big topic in functional trait ecology too. For a long time the name of the game was to get species averages and compare them. But increasingly people are realizing that hid enormous within species variation. And that variation is critical to understanding what is going on. Probably the most high profile paper on this topic is Jim Clarks 2010? paper on how coexistence is easier to understand when you realize that just because on average one species is better than another at something (e.g. growth rate) that the relationship may be reversed between any two individuals from those two species.

    I have claimed before, maybe even in the comments of this blog, that as a science matures it focuses more on the variation.

    • That Clark paper is puzzling, I’ve never been able to understand exactly what the stabilizing mechanism is supposed to be. And from conversations I’ve had with other folks doing modern coexistence theory, I’m not the only one who doesn’t get it. Which doesn’t mean it’s wrong, of course. Near the top of my long list of little side projects that I’ll probably never get to is “try to code up the verbal model of Clark 2010 and see if it actually works”.

  3. Interesting post, thanks. I was about to suggest the first chapter of Jim Clark’s ‘Models for ecological data’ (available for free here His Science 2010 paper is great too, but this book chapter is much easier to understand. I think it makes a compelling case for considering variation beyond averages (and not necessarily from a Bayesian point of view). Just a quick look at the figures in that chapter serves to understand how an excessive reliance on averages (e.g. species average growth, shade tolerance, etc) leads to misunderstanding. Good read! That said, I think both averages and variation are interesting.

    • I have a look at Jim’s book. Afraid I still don’t see how the coexistence mechanism in his 2010 paper is supposed to work. I agree that it *sounds* intuitive–but the more you think about it, the less it seems like it should work. At least, that’s been my experience as I’ve thought about it more.

      Don’t misunderstand: it’s absolutely correct to say that intraspecific variation *could* affect species coexistence, changing the outcome of interspecific competition from what would’ve occurred in an otherwise-equivalent world lacking intraspecific variation. Dan Bolnick and colleagues have a review paper in TREE that covers some of these mechanisms. But I’m not convinced that the *specific mechanism* Jim’s proposing is among the mechanisms by which intraspecific variation can create stable coexistence (i.e. each species can increase when rare, on average) that wouldn’t otherwise occur.

  4. Hey Jeremy- Primo topic! Statistics are my morphine.

    I have found some papers in recent years that suggest some pretty compelling, alternative approaches that I found to be of benefit in expanding my consciousness on statistics:

    Adler S., Hübener T., Dreßler M., Lotter A. F. and Anderson N. J. 2010. A comparison of relative
    abundance versus class data in diatom-based quantitative reconstructions. Journal of Environmental Management 91: 1380-1388.

    Jacquier E., Kane A. and Marcus A. J. 2003. Geometric or arithmetic mean: A reconsideration.
    Analysts Journal 59(6): 46-53.

    Johnson D.H. 1999. The insignificance of statistical significance testing. Journal of Wildlife
    Management 63(3): 763-772.

    Limpert E., Stahel W. A. and Abbt M. 2001. Log-normal distributions across the sciences: Keys
    and clues. Bioscience 51(5): 341-352.

    The paper by Limpert et al. 2001 is especially important for ecologists, I think, because it challenges the manner in which many of us traditionally assess the garden-variety log normal species distribution.

    Your points about means and what they mean really hit home for me. So too did your examples from medicine. Ever increasingly, and since about 1990 or so, medicine has learned time and again that means can be misleading. More often than not, these “one size fits all” approaches are not productive. This explains, for example, why in general effective medicinal approaches for women and children are lagging compared to men, because men used to predominate those “averages”. It also explains why cancer treatment efficacy outcomes were so poor for so long: because the means were virtually useless is predicting individual patient outcomes.

    I always advise use of the means of any variable as a guide, a point of beginning for your intellectual process of inquiry. And when comparing means of variables, to consider that the apparent lack of an association can often be more compelling than any degree of apparent dependency. ANOVA is a very useful tool, but as is the case with regression analyses, it will not detect all possible variation and dependencies within a given model. I cannot say enough about the various likelihood procedures, and I always urge investigators to apply them when your data allow it. Visual pattern analysis can also be very helpful- so consider spending considerable time observing your scatter plots, because at times you will observe a phenomenon that can point you toward another batch of analyses.

  5. Really interesting post. I’m not sure that the most important issue is the level where one takes the average, though.

    Adding group-level means will always increase one’s R-squared. However (depending on our goals), we might not have learned *any* useful biology until we have a scientific explanation for *why* the means vary. And this is the really tricky part.

    In my experience, biologists are extremely good at coming up with post-hoc reasons for why two means are different, but it’s much harder to make predictions about unobserved groups.

    It’s also worth noting that biological explanations aren’t always better than neutral ones, especially if the biology really does tend to wash out for the pattern of interest. There are still areas of macroecology where it’s hard to do much better than John Harte’s maximum entropy approach. Or, if I can use an example from my own research, there’s a subfield of invasion ecology where biologists have proposed lots of explanations for differences in group-level means (time since invasion, dispersal ability of the invader, etc.), but it turns out that 95% of the variation can be explained by how widespread the invader is relative to the native species (Harris et al. 2011, Am Nat) and that the residuals don’t seem to have all that much to do with the biology of the invader.

    • Yes to all this. Your example from invasion biology gets repeated in a lot of areas of ecology. There’s some pattern in the data for which people propose all sorts of interesting (and usually not-mutually-exclusive) biological explanations. None of which really explain much of the variation, at least not after you account for some “uninteresting” effect or predictor.

      Although this gets into another interesting issue I keep meaning to post on: when should you say that some purportedly “uninteresting” effect or predictor “explains” the pattern of interest? Because “uninteresting” and “explains” often in the eye of the beholder.

      For instance, think of the “more individuals” hypothesis for explaining why more productive areas have more species. Well “of course” those areas “obviously” have more species–they have more individuals, and every individual is just another sample from the “species pool”. But personally, I don’t think “of course” or “obviously” is appropriate here. The more individual hypothesis assumes that high- and low-productivity areas all have species pools that are sufficiently similar in relevant respects. Which, if so, is far from being obvious–in fact it is a highly non-obvious fact that would itself require explanation. So at best, I think the “more individuals” hypothesis just changes the question rather than actually answering the question. One could of course say something similar about many applications of MaxEnt. Which isn’t necessarily a criticism–sometimes changing the question is just what the doctor ordered. But I do think it’s important to distinguish between answering the question and changing the question.

      • Thanks Jeremy! I’m really enjoying this post & the discussion it raised.

        Regarding richness/abundance/productivity, the assumptions you mentioned are easy to relatively easy to relax. For example, one might discover some interesting biology by tweaking the model to allow for correlations between the number of individuals that occur and the kinds of individuals that occur. But those correlations might be much more difficult to detect if we started from a different modeling approach that didn’t already emphasize the role of abundance.

        From my perspective, this isn’t so much about changing the question as it is about refocusing it.

        I feel the same way about MaxEnt. If we’re trying to explain some aspects of plant community structure and we *don’t* account for constraints imposed by the state variables, our predictions might end up being as bad as if we didn’t account for botany.

    • Hey Dave- You raise many thought provoking issues in your comment. Means, what they mean in relation to other means, discernible patterns, and biologically-relevant (i.e., usefully applied) explanations cover a heck of a lot of ground. Although this and more, at least statistically, has been the focus of my research for 6 years running. Later this month, I have a publication coming out (Quantum & Classical Mechanics of the Vegetative Complex Health Index… yes, that was a shameless plug… sorry, I just could not resist;-)) that examines these issues ad nauseam.

      To highlight some of my findings concerning your points- there are certainly instances where it seems NO RELATIONSHIP (i.e., correlation) exists between variables based upon regression tests, ANOVA,f-tests, etc. One such occurrence was the apparent absence of a relationship between community structure and biodiversity. In this case, I had to dig very deeply, and rely upon visual pattern analysis of scatter plots to develop alternative statistical approaches to reveal what was, at least in my warped mind, a profoundly and biologically-relevant relationship.

      Another really interesting finding was that percent cover of exotic species did not vary between study sites stratified for the exotic community v. those not stratified for the exotic community. Again, an apparent lack of correlation between variables that seemed counter-intuitive. However, that apparent independence was “swamped out” not by the multitude of variation in the environment, but in the distinctly disparate biogeographic structure of two dominant invasive species. Together, their percent cover estimates made the relationship look uninteresting, when in fact it was biologically relevant. What’s more, even though percent covers of exotic species did not differ between these two groups, biodiversity varied significantly, apparently suggesting that stratification of the exotic community (a community biogeographic variable) acted upon biodiversity independent of percent cover of exotic species.

      The take home message, I think, is that we should not be dissuaded when our first, second or third looks fail to reveal interesting patterns, statistically significant outcomes, or biologically-relevant explanations. If anything, a failure to reject the null should motivate one to dig more deeply.

  6. Jeremy,

    This is a great post and one I’m glad you’ve written because I too struggle with when means are useful and when they are not. I do a lot of reductionist, ANOVA type designs with discrete factors and the mean seems fine there (as long as you avoid Simpson’s Paradox). But I also do more and more observational and gradient-experiments, where the mean seems less and less useful for such continuous data. This completely goes against my training and largely the doctrine in my field (soil/ ecosystem ecology) and so it’s easy to slip back “into the mean” when I know I shouldn’t. I’m a huge fan of Gelman’s (so glad you used him in this post) – especially his “Rich State, Poor State, Red State, Blue State” 2007 paper – and (similar to Brian McGill’s comment above) think Jim Clark’s 2010 Science paper is perhaps the best ecological example where mechanistic explanation and prediction breaks down entirely when you look at the mean. But the paper that really slapped me awake to the false promises of the mean for continuous data was Robinson’s 1950 paper in American Sociological Review. What struck me hardest was the fact that a social scientist was writing about one of the main statistical fallacies (I realize this sounds like a Greek myth) known as…wait for it…”ecological correlation”. Ouch. Why did we get tarred with that fallacy? Even Robinson acknowledged that no real ecologist would really be interested in the outcomes of using such mean correlations to infer mechanism (I’m paraphrasing his remark about wanting to learn about individuals). I think a bigger issue here – not yet addressed in the post/ comments (but maybe you guys have elsewhere) – is how a focus on the mean might influence experimental design and so narrow mechanistic inferences by never revealing the variation that might suggest alternative conclusions.

    Best wishes,


  7. Okay, for all you who believe that there is more to ecology/biology than averages, I suggest you consider quantile regression as an extremely flexible method for extending your statistical modeling to other quantities (or intervals of values) that might be more illuminating. Check out Cade and Noon (2003) for an introduction to the method (obviously a shameless plug).

    • Thanks for the plug.

      Though are you suggesting that quantile regression addresses the conceptual issue raised in the post? In all of the examples I gave (interpreting allometric exponents, levels of selection, etc.), everyone already knows that there’s variation around the average, often substantial variation, and can put numbers on that variation. But the question of interpretation still remains. That’s what makes this issue difficult, I think–it’s conceptual rather than technical. It has to do with the meaning or interpretation of numbers that we already know how to estimate/model/etc.

  8. Hi Jeremy,

    Apologies for being late to the discussion (on this post, and in general).

    Great post.

    I would contend that averages are not just epiphenomena and that the key issue is often (though not always) how variation contributes to the average, and which ‘average’ is most appropriate.

    Using Gelman’s quote as a pivot:

    “Some trends go up, some go down. Is the average trend positive or negative? Who cares?”

    It’s almost trivial but a species at low-density cares – its persistence depends on whether its average growth rate is positive. However, whether its average growth rate is positive depends on the variation in its annual growth rate. So with respect to extinction risk, species care about variation and averages.

    But it’s more complicated because where variation is present one average does not always equal another. The arithmetic mean of the variable annual growth rates gives a misleading picture of the long-term growth rate and, therefore, extinction risk. This was articulated for population biologists by Lewontin and Cohen in 1969. Because growth is multiplicative, species that are vulnerable to extinction care about geometric means, which appropriately account for the effects of variable annual growth rates on long-term (average) growth rates.

    More generally, the arithmetic mean does not appropriately account for the effects of variation when dynamics are nonlinear. And nonlinearities in ecology and evolution are pervasive. Therefore, nonlinear averaging becomes key but this realization has not penetrated the discipline as far as it needs to.

    Finally, how you care about variation depends critically on the response variable of interest. In a very good paper Lloyd-Smith et al. 2005 show how individual variation in disease transmission contributes to epidemics. Where epidemics are concerned, we are rarely interested in average outcomes because the event of greatest interest (an epidemic) is not the average event. The average event (mean, median, and mode) when a disease emerges is rapid extinction – the epidemic is the outlier and so Lloyd-Smith appropriately assess the likelihood of the outliers (the mean outlier?). In contrast, dealing with exactly the same kinds of individual variation it is not appropriate to concentrate on results that are outliers if you are concerned with the ability of species to coexist. Here, we are rightly interested in the average event. Acknowledging that there will be variable outcomes, on average, do we expect species to coexist? Coexistence criteria are, therefore, based on averages. And these criteria explicitly rely on averages that appropriately account for variation where nonlinearities are present – indeed, where a combination of variation and nonlinearities are often necessary for coexistence to occur.

    I think this is an important topic that can be resolved. And when it is resolved, ecology will be better for it.


  9. Pingback: How far can the logic of shrinkage estimators be pushed? (Or, when should you compare apples and oranges?) | Dynamic Ecology

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.