Recently I polled y’all on how many effect size estimates an ecological meta-analysis needs to include to be “big enough”. That is, how many it needs to have for you to be reasonably confident that the estimated mean effect size won’t change too much as more studies are published in future (I’m summarizing the poll question; follow that last link if you want to see the exact poll question.)
Here are the poll responses to the question, along with an answer drawn from my pretty-comprehensive database of 476 ecological meta-analyses.
tl;dr: Ecological meta-analyses need to be waaaaaaaaaay bigger than most respondents think they need to be. They also need to be waaaaaaaaaay bigger than most of them actually are! At least, that’s sure how it looks to me, but have a look yourself and tell me if I’m wrong!
We got 45 poll respondents–thank you to everyone who took the poll. Not a big sample, and surely not a random sample of ecologists, or even of our regular readers. But it seems like a big enough sample to be worth talking about, particularly since most responses fell within a pretty narrow range.
Nine poll respondents said “it depends”, so we’ll set them aside. One said (in so many words) that ecological meta-analyses are meaningless because there’s too much methodological variation among ecological studies. We’ll come back to that in a future post! Of the 35 respondents who gave a number of effect sizes that’s “enough” (or range of numbers in a couple of cases), 29 respondents gave some number ≤60. 50 was the modal answer, and the min was 10. Five respondents gave some number from 100-150 effect sizes, and one said 500.
It’s worth noting that most of those answers are comparable to, or smaller than, the size of a typical ecological meta-analysis. The median ecological meta-analysis includes 60 effect size estimates. So as a group, most poll respondents are pretty confident in the estimated mean effect size from a typical ecological meta-analysis.
Whereas I’m pretty confident that they shouldn’t be so confident! As best I can tell from my admittedly-preliminary analyses, it looks to me like most ecological meta-analyses include waaaaaaay too few effect size estimates for us to be confident that the estimated mean effect size has stabilized, and wouldn’t change too much in future if further effect size estimates were published.
But maybe I’m wrong! Here are my exploratory analyses–you tell me if you buy them.
Exploratory meta-analysis of ecological meta-analyses
My compilation includes 476 ecological meta-analyses. For each meta-analysis, I have every effect size estimate included in the meta-analysis, the year of publication of every effect size estimate, an identifying label for the study (i.e. paper) in which the effect size estimate was published, the sampling variance of the effect size estimate, what kind of effect size measure it was (e.g., Hedge’s g, log-transformed response ratio, etc.), and a few other bits of information. The database includes over 20,000 studies and 114,000 effect size estimates in total.
For each of the 476 meta-analyses in my compilation, I used the metafor package in R to fit a meta-analysis with hierarchical random effects of study, effect size within study, and sampling error. (Note that a few of the meta-analyses couldn’t be fit.) One output is an estimate of the grand mean effect size for each meta-analysis. Here’s a graph of mean effect size for each meta-analysis vs. the number of effect size estimates in the meta-analysis:
Now, this is a pretty crude graph, because it lumps together meta-analyses that used different effect size measures (Hedge’s g, log-transformed response ratio, Fisher’s z-transformed correlation, etc.). But the pattern is clear enough: mean effect size estimates vary widely among meta-analyses with just a few hundred effect size estimates or fewer. Mean effect size estimates for larger meta-analyses cluster closer to one another (and to zero). That surely implies that, if those small meta-analyses were made bigger by publication of additional effect size estimates, their means would tend to get pulled closer to one another (and to zero).
Put another way, if you say that 10 or 50 or even 150 effect size estimates is “enough”, then what you’re saying is that you believe most of the mean effect size estimates in the graph above. Which means you think that it’s amazing coincidence that the biggest ecological meta-analyses rarely or never report mean effect sizes all that far from zero. You think it just so happens that the phenomena for which ecologists have published the most effect size estimates are phenomena that have small effects on average.
But I’m guessing that some of you won’t find this convincing. After all, variances are hard to eyeball from graphs like the one above. On the left-hand size of the graph, there are a bunch of means far from zero–but also a bunch of means close to zero. And there are many fewer large meta-analyses than small ones. Maybe if we had more really large meta-analyses, we’d have some for which the mean effect size is very far from zero. So maybe everything I said in the previous paragraph is wrong.
So let’s do another exploratory analysis that makes more complete use of the information in the data. Let’s do 476 cumulative meta-analyses. That is, for each of the 476 meta-analyses, do a hierarchical random effects meta-analysis that only includes the effect sizes from the first two published studies (i.e. the two earliest studies). Then add in the effect size(s) from the third published study. And so on, until the meta-analysis incorporates all the effect sizes, in the temporal order in which they were published. This allows us to look at how the mean effect size estimate, and its 95% confidence interval, changed over time as more and more effect sizes were published.* (Again, it’s actually not quite 476 meta-analyses because a few of them couldn’t be fit.)
Here’s a graph of every cumulative mean effect size estimate from every cumulative meta-analysis, as a function of the number of effect sizes included in the cumulation. Each cumulative mean is expressed as a percentage of the final cumulative mean for that meta-analysis. So for instance, if for meta-analysis #1 the cumulative mean effect size after the first two studies was based on a total of 5 effect sizes, and was twice as big as the final mean for meta-analysis #1, then that point would fall at a value of 5 on the x-axis and 200% on the y-axis. Some of the percentages are negative because the cumulative mean of the first n effect sizes doesn’t always have the same sign as the final mean of all effect sizes in the meta-analysis.
Again, this is a very crude graph. It’s lumping together meta-analyses that use different effect size measures. And you can’t tell which points come from which meta-analyses. But still, it’s clearly pretty common for estimates of mean effect size based on dozens or even hundreds of effect sizes to be quite far away from their final values. Look at the y-axis scale–it runs from -5,000% to +10,000%! That is, as more and more effect size estimates are published, it’s quite common for the cumulative mean to keep bouncing around appreciably, even after hundreds of effect sizes have been published.
A better way to see this is to inspect the time series of cumulative means for individual meta-analyses. So here’s a graph that just shows a small, haphazardly chosen subset of the meta-analyses in the previous graph. Now I’ve plotted the data as a line graph with a differently-colored line for each meta-analysis, so you can see how the cumulative mean effect size for each meta-analysis (expressed as a percentage of the final mean) changed as more and more effect size estimates were published:
A good way to look at the above graph is to look first at the meta-analyses that included 1500+ effect size estimates in total–the ones for which the lines extend farthest to the right on the x-axis. They’re some of the biggest meta-analyses in ecological history. Notice that they all have lengthy stretches during which it appears that the mean effect size has stabilized, only for it later to become clear that it hasn’t yet. For instance, consider the highest (brown) line. The cumulative mean effect size for that meta-analysis was pretty stable, or stable-ish, after about 600 effect sizes had been published until almost 1000 had been published. But then the cumulative mean effect size started dropping further. It appears to have asymptoted at close to its final value only after ~1700 effect sizes were published. (And who knows, maybe it’ll change further at some point in the future!)
Keeping that dynamic in mind, if you now look at the smaller meta-analyses on the graph above (i.e. the ones that only include a few hundred effect sizes at most), you see that their cumulative mean effect sizes change very fast from their initial values to their final values. Which is a sign that those “final” values aren’t really final. If many more effect sizes are published in future, those “final” values are likely to change further.
Now, you could argue that the above graph makes fluctuations in the mean effect size look bigger than they really are, in cases where the mean is fluctuating around some final value that’s close to zero. So here’s one more way to look at the data. The graph below is a bit like the last one, except that instead of plotting the cumulative mean as a percentage of the final mean, I’m plotting the width of the 95% confidence interval of the cumulative mean, as a percentage of its width after the first two studies. So now I’m asking, how does the precision of the estimated mean effect size change as more and more effect sizes are published and incorporated into the cumulative meta-analysis?
If you interpret the 95% confidence interval as defining the range of reasonably plausible (i.e. likely) estimates of the “true” population mean, well, ecologists’ estimates of what’s plausible often change appreciably, even after 10 or 50 or even 150 effect size estimates have been published. It looks to me like 95% c.i. width typically doesn’t stabilize until something like 250-500 effect size estimates have been published. And although I’m not showing it here, I can tell you that these 95% confidence intervals often are pretty wide by any reasonable standard. It’s not that the initial 95% confidence intervals are all very narrow, so that trivially-small fluctuations in their width get magnified when expressed as percentages of the initial width.
Bottom line: depending on exactly how you prefer to define a “stable” mean effect size, it looks to me like you need at least 250-500 effect size estimates, maybe more, before you can be reasonably comfortable that you have “enough” effect size estimates. Before you can be reasonably confident that you know what the mean effect size is, or within what range it’s likely to fall.
I buy this because it’s consistent with my general background knowledge of hierarchical random effects models. I’m admittedly not an expert. But my understanding is that, if you have a hierarchical random effects model in which there’s substantial heterogeneity–i.e. substantial variation in means among groups and subgroups (which there is: Senior et al. 2016)–then you need a lot of observations to estimate the grand mean with high precision. One way to think about it is that, because of the hierarchical structure of your data, your observations are quite non-independent of one another. Having many non-independent observations is a bit like having some much smaller number of independent observations. So I don’t think we should be surprised that you need way more effect sizes to estimate mean effect size with any precision, than you would if you were estimating the mean of a single homogeneous population by randomly sampling independent observations from that population.**
I have many more thoughts on this. I’m already writing a follow-up post that tries to anticipate, and address, objections to everything I just said. Because based on the poll results, I’m guessing I might get some pushback! I’m not only saying that ecological meta-analyses need to be much bigger than most ecologists think they need to be. I’m saying they need to be much bigger than most of them actually are. Which I recognize is likely to a controversial suggestion! But before I decide for sure that this is a hill I’m willing to die on, it’s probably better if I just stop here and open the floor. Looking forward to learning from your thoughts, feedback, and criticisms.**
Note that I am in meetings most of the day today, so comment moderation and replies will be slow. But perhaps that’s for the best; it’ll hopefully allow a good conversation to develop without me dominating the conversation.
*Doing this took me several days on a brand-new laptop. I am retroactively trying to compete in our old “craziest thing you’ve ever done for science” contest. 🙂
**I hope somebody pushes back against this post by saying “But moderator variables!” Because I already have a response to that pushback drafted for my follow-up post. 🙂