I wish I could’ve titled this post “The
call heterogeneity is coming from inside the house primary studies”, but WordPress doesn’t allow strikethrough text in post titles. 🙂
Ecology is full of variability, and most of it isn’t just sampling error that would go away if only we had large enough sample sizes. For instance, in a typical ecological meta-analysis, something like 85% of the variation in effect size is attributable not to sampling error, but rather to “heterogeneity”: real variation in the true mean effect size (Senior et al. 2016).
Variation cries out for explanation. If it’s not sampling error, then that means there must be some reason(s) for it. We’d like to know the reason(s)! That’s why ecological meta-analyses routinely include moderator variables–covariates that might explain some of the heterogeneity in effect size. Perhaps effect sizes vary because some primary studies are observational and others are experimental. Maybe some primary studies were conducted on birds and others were conducted on mammals. Maybe some primary studies were conducted on islands and others were conducted on continents. Maybe different primary studies were conducted at different latitudes, or using different methods, or etc.
But what if most of the variation isn’t among primary studies? Rather, what if most of the variation is within primary studies? After all, many primary studies (i.e. single research papers) report multiple effect sizes. The investigators conducted the same experiment on each of three related species, or in each of two different habitats, or etc. Now, it might seem far-fetched to worry that effect sizes from the same primary study will be all that heterogeneous. After all, effect sizes reported in the same primary study ordinarily have a lot in common. They’re based on data collected by the same investigators, using the same methods, usually at the same time, and usually at the same or nearby locations. That’s why effect sizes from the same primary study generally share the same values for most or even all of the moderator variables in a typical meta-analysis. How much within-study heterogeneity in effect size could there possibly be?
About as much as there is among studies, actually! Below is a graph from my fairly comprehensive compilation of over 450 ecological meta-analyses. For each meta-analysis, I used a hierarchical random effects model to partition the variation in effect size into variation among primary studies, within primary studies, and sampling error. The graph below plots the % of variation in effect size attributable to among-study variation vs. the % attributable to within-study variation. There’s one point for each meta-analysis.
The first thing you’ll notice that most of the observations fall close to an imaginary boundary line with a slope of -1, running from the upper-left corner to the lower-right corner. That reflects the fact that, for most meta-analyses, most the variation in effect size is due to heterogeneity (the sum of among-study + within-study heterogeneity), not sampling error. The boundary line marks all combinations of among-study heterogeneity and within-study heterogeneity that add up to 100% of the total variance in effect size.
But that’s not the important thing to notice for purposes of this post. For purposes of this post, the important thing to notice is that most points are not clustered in the upper-left corner. Rather, they’re spread out pretty uniformly from the upper-left to the lower-right, just below the boundary line. Which means that within-study heterogeneity is about as large, on average, as among-study heterogeneity. Effect sizes reported in the same primary study are just as different from one another, on average, as are effect sizes reported in different primary studies.
Now, there are some meta-analyses for which the heterogeneity is entirely among studies; within-study heterogeneity is estimated to be zero or close to zero. But don’t get too excited about that, for two reasons. First, there are also some meta-analyses for which the heterogeneity is entirely within studies! Second, most of the meta-analyses with zero within-study heterogeneity (or zero among-study heterogeneity) are small meta-analyses that only include a handful of studies. Here’s a graph of within-study heterogeneity, as a function of the number of primary studies in the meta-analysis:
Notice that most of the meta-analyses with 0% (or 100%) of variation in effect size attributable to within-study heterogeneity have <25 studies, and all but two have <75 studies. That strongly suggests that, if more studies of those topics were conducted, substantial within- and among-study heterogeneity would be revealed.
If you’re someone wants to explain variation in effect size, I think these results should worry you. In a typical ecological meta-analysis, something like 50% of the variance in effect size is variance among effect sizes within studies. Sources of within-study variation are going to be difficult or impossible to identify! Many of the usual moderator variables aren’t going to help, because they don’t vary within studies.
These results make me wonder how much distributed experiments like NutNet cut down on heterogeneity. One reason to conduct a distributed experiment is to eliminate some sources of heterogeneity in effect size.* Investigators at many different locations all perform the same experiment, at the same time, using the same methods, on organisms that are sufficiently similar to one another in various ways (body size, behavior, etc.) that one can study them all using the same methods. But of course, “same experiment,” “same time”, “same methods”, and “sufficiently similar organisms” all apply to pretty much every primary study in ecology. Apparently, all that sameness within a given primary study still leaves considerable scope for heterogeneity among the effect sizes reported by that study. So I think it’d be really interesting to quantify how much heterogeneity there is among effect sizes in a single distributed experiment like NutNet, as compared to heterogeneity among effect sizes in a meta-analysis of a bunch of primary studies.
UPDATE: see the comments, where a commenter makes a very good point that in retrospect I probably should’ve made in the post: these estimates of within- and among-study heterogeneity are just that, estimates. They have error bars–quite possibly big ones. See the comments for discussion of this and its implications for the points made in the post. /end update
p.s. I’ve made the points in this post before. But I haven’t shown the graphs before, so I decided to give them a standalone post.
*There are of course other reasons to do distributed experiments. Follow that last link for a great interview with NutNet co-founder Elizabeth Borer, addressing this point and much more.