Recently I polled y’all on how many effect size estimates an ecological meta-analysis needs to include to be “big enough”. That is, how many it needs to have for you to be reasonably confident that the estimated mean effect size won’t change too much as more studies are published in future (I’m summarizing the poll question; follow that last link if you want to see the exact poll question.)

Here are the poll responses to the question, along with an answer drawn from my pretty-comprehensive database of 476 ecological meta-analyses.

tl;dr: Ecological meta-analyses need to be waaaaaaaaaay bigger than most respondents think they need to be. They also need to be waaaaaaaaaay bigger than most of them actually are! At least, that’s sure how it looks to me, but have a look yourself and tell me if I’m wrong!

**Poll results**

We got 45 poll respondents–thank you to everyone who took the poll. Not a big sample, and surely not a random sample of ecologists, or even of our regular readers. But it seems like a big enough sample to be worth talking about, particularly since most responses fell within a pretty narrow range.

Nine poll respondents said “it depends”, so we’ll set them aside. One said (in so many words) that ecological meta-analyses are meaningless because there’s too much methodological variation among ecological studies. We’ll come back to that in a future post! Of the 35 respondents who gave a number of effect sizes that’s “enough” (or range of numbers in a couple of cases), 29 respondents gave some number ≤60. 50 was the modal answer, and the min was 10. Five respondents gave some number from 100-150 effect sizes, and one said 500.

It’s worth noting that most of those answers are comparable to, or smaller than, the size of a typical ecological meta-analysis. The median ecological meta-analysis includes 60 effect size estimates. So as a group, most poll respondents are pretty confident in the estimated mean effect size from a typical ecological meta-analysis.

Whereas I’m pretty confident that they shouldn’t be so confident! As best I can tell from my admittedly-preliminary analyses, it looks to me like most ecological meta-analyses include *waaaaaaay* too few effect size estimates for us to be confident that the estimated mean effect size has stabilized, and wouldn’t change too much in future if further effect size estimates were published.

But maybe I’m wrong! Here are my exploratory analyses–you tell me if you buy them.

**Exploratory meta-analysis of ecological meta-analyses**

My compilation includes 476 ecological meta-analyses. For each meta-analysis, I have every effect size estimate included in the meta-analysis, the year of publication of every effect size estimate, an identifying label for the study (i.e. paper) in which the effect size estimate was published, the sampling variance of the effect size estimate, what kind of effect size measure it was (e.g., Hedge’s g, log-transformed response ratio, etc.), and a few other bits of information. The database includes over 20,000 studies and 114,000 effect size estimates in total.

For each of the 476 meta-analyses in my compilation, I used the metafor package in R to fit a meta-analysis with hierarchical random effects of study, effect size within study, and sampling error. (Note that a few of the meta-analyses couldn’t be fit.) One output is an estimate of the grand mean effect size for each meta-analysis. Here’s a graph of mean effect size for each meta-analysis vs. the number of effect size estimates in the meta-analysis:

Now, this is a pretty crude graph, because it lumps together meta-analyses that used different effect size measures (Hedge’s g, log-transformed response ratio, Fisher’s z-transformed correlation, etc.). But the pattern is clear enough: mean effect size estimates vary widely among meta-analyses with just a few hundred effect size estimates or fewer. Mean effect size estimates for larger meta-analyses cluster closer to one another (and to zero). That surely implies that, if those small meta-analyses were made bigger by publication of additional effect size estimates, their means would tend to get pulled closer to one another (and to zero).

Put another way, if you say that 10 or 50 or even 150 effect size estimates is “enough”, then what you’re saying is that you believe most of the mean effect size estimates in the graph above. Which means you think that it’s amazing coincidence that the biggest ecological meta-analyses rarely or never report mean effect sizes all that far from zero. You think it just so happens that the phenomena for which ecologists have published the most effect size estimates are phenomena that have small effects on average.

But I’m guessing that some of you won’t find this convincing. After all, variances are hard to eyeball from graphs like the one above. On the left-hand size of the graph, there are a bunch of means far from zero–but also a bunch of means close to zero. And there are many fewer large meta-analyses than small ones. Maybe if we had more really large meta-analyses, we’d have some for which the mean effect size is very far from zero. So maybe everything I said in the previous paragraph is wrong.

So let’s do another exploratory analysis that makes more complete use of the information in the data. Let’s do 476 *cumulative *meta-analyses. That is, for each of the 476 meta-analyses, do a hierarchical random effects meta-analysis that only includes the effect sizes from the first two published studies (i.e. the two earliest studies). Then add in the effect size(s) from the third published study. And so on, until the meta-analysis incorporates all the effect sizes, in the temporal order in which they were published. This allows us to look at how the mean effect size estimate, and its 95% confidence interval, changed over time as more and more effect sizes were published.* (Again, it’s actually not quite 476 meta-analyses because a few of them couldn’t be fit.)

Here’s a graph of every cumulative mean effect size estimate from every cumulative meta-analysis, as a function of the number of effect sizes included in the cumulation. Each cumulative mean is expressed as a percentage of the final cumulative mean for that meta-analysis. So for instance, if for meta-analysis #1 the cumulative mean effect size after the first two studies was based on a total of 5 effect sizes, and was twice as big as the final mean for meta-analysis #1, then that point would fall at a value of 5 on the x-axis and 200% on the y-axis. Some of the percentages are negative because the cumulative mean of the first n effect sizes doesn’t always have the same sign as the final mean of all effect sizes in the meta-analysis.

Again, this is a very crude graph. It’s lumping together meta-analyses that use different effect size measures. And you can’t tell which points come from which meta-analyses. But still, it’s clearly pretty common for estimates of mean effect size based on dozens or even hundreds of effect sizes to be quite far away from their final values. Look at the y-axis scale–it runs from -5,000% to +10,000%! That is, as more and more effect size estimates are published, it’s quite common for the cumulative mean to keep bouncing around appreciably, even after hundreds of effect sizes have been published.

A better way to see this is to inspect the time series of cumulative means for individual meta-analyses. So here’s a graph that just shows a small, haphazardly chosen subset of the meta-analyses in the previous graph. Now I’ve plotted the data as a line graph with a differently-colored line for each meta-analysis, so you can see how the cumulative mean effect size for each meta-analysis (expressed as a percentage of the final mean) changed as more and more effect size estimates were published:

A good way to look at the above graph is to look first at the meta-analyses that included 1500+ effect size estimates in total–the ones for which the lines extend farthest to the right on the x-axis. They’re some of the biggest meta-analyses in ecological history. Notice that they all have lengthy stretches during which it appears that the mean effect size has stabilized, only for it later to become clear that it hasn’t yet. For instance, consider the highest (brown) line. The cumulative mean effect size for that meta-analysis was pretty stable, or stable-ish, after about 600 effect sizes had been published until almost 1000 had been published. But then the cumulative mean effect size started dropping further. It appears to have asymptoted at close to its final value only after ~1700 effect sizes were published. (And who knows, maybe it’ll change further at some point in the future!)

Keeping that dynamic in mind, if you now look at the smaller meta-analyses on the graph above (i.e. the ones that only include a few hundred effect sizes at most), you see that their cumulative mean effect sizes change very fast from their initial values to their final values. Which is a sign that those “final” values aren’t really final. If many more effect sizes are published in future, those “final” values are likely to change further.

Now, you could argue that the above graph makes fluctuations in the mean effect size look bigger than they really are, in cases where the mean is fluctuating around some final value that’s close to zero. So here’s one more way to look at the data. The graph below is a bit like the last one, except that instead of plotting the cumulative mean as a percentage of the final mean, I’m plotting the width of the 95% confidence interval of the cumulative mean, as a percentage of its width after the first two studies. So now I’m asking, how does the precision of the estimated mean effect size change as more and more effect sizes are published and incorporated into the cumulative meta-analysis?

If you interpret the 95% confidence interval as defining the range of reasonably plausible (i.e. likely) estimates of the “true” population mean, well, ecologists’ estimates of what’s plausible often change appreciably, even after 10 or 50 or even 150 effect size estimates have been published. It looks to me like 95% c.i. width typically doesn’t stabilize until something like 250-500 effect size estimates have been published. And although I’m not showing it here, I can tell you that these 95% confidence intervals often are pretty wide by any reasonable standard. It’s not that the initial 95% confidence intervals are all very narrow, so that trivially-small fluctuations in their width get magnified when expressed as percentages of the initial width.

**Bottom line: depending on exactly how you prefer to define a “stable” mean effect size, it looks to me like you need at least 250-500 effect size estimates, maybe more, before you can be reasonably comfortable that you have “enough” effect size estimates.** Before you can be reasonably confident that you know what the mean effect size is, or within what range it’s likely to fall.

I buy this because it’s consistent with my general background knowledge of hierarchical random effects models. I’m admittedly not an expert. But my understanding is that, if you have a hierarchical random effects model in which there’s substantial heterogeneity–i.e. substantial variation in means among groups and subgroups (which there is: Senior et al. 2016)–then you need a *lot *of observations to estimate the grand mean with high precision. One way to think about it is that, because of the hierarchical structure of your data, your observations are quite non-independent of one another. Having many non-independent observations is a bit like having some much smaller number of independent observations. So I don’t think we should be surprised that you need way more effect sizes to estimate mean effect size with any precision, than you would if you were estimating the mean of a single homogeneous population by randomly sampling independent observations from that population.**

I have many more thoughts on this. I’m already writing a follow-up post that tries to anticipate, and address, objections to everything I just said. Because based on the poll results, I’m guessing I might get some pushback! I’m not only saying that ecological meta-analyses need to be much bigger than most ecologists think they need to be. I’m saying they need to be much bigger than most of them actually are. Which I recognize is likely to a controversial suggestion! But before I decide for sure that this is a hill I’m willing to die on, it’s probably better if I just stop here and open the floor. Looking forward to learning from your thoughts, feedback, and criticisms.**

Note that I am in meetings most of the day today, so comment moderation and replies will be slow. But perhaps that’s for the best; it’ll hopefully allow a good conversation to develop without me dominating the conversation.

*Doing this took me several days on a brand-new laptop. I am retroactively trying to compete in our old “craziest thing you’ve ever done for science” contest. 🙂

**I hope somebody pushes back against this post by saying “But moderator variables!” Because I already have a response to that pushback drafted for my follow-up post. 🙂

I am one of those answering that it depends, because it truly does. Similar to all other statistical analyses that we do, it depends on the possible variation. For metaanalyses, the most important dependencies is the scope of the question and on how good the incoming effect sizes are. If the incoming effect sizes have a very small error, then you of course need much fewer effect sizes than if they have big errors. When it comes to scope, it is a big difference if you try to answer a global and broad research question, as how important natural enemies generally are for biocontrol, compared to if you try to answer a more specific question. Even though I find the global and broad research questions often very usefully, particularly to motivate further research, I must admit that I seldom trust the actual values coming out of them. The data is often too heterogeneous (both in quality and in scope) to provide quantitatively meaningful results. But, these studies are still valuable in some broader sense. So for these type of studies, the number of effect sizes are quite uninteresting as we often know little about their underlying qualities. It is like putting all your clothes into the washing machine irrespective of colors, you will like end up with the average color on all of them but that would provide very little information on pre-washing conditions.

The more targeted metanalyses are different, where we actually have a possibility to quantitatively determine some parameter value of a relationship, but then the measures are often few enough so that we can evaluate the quality of each of them.

So, sorry Jeremy, I am not sure that I find your results to be that valuable as your analysis of analyses essentially mashes up a lot of studies or varying qualities and questions. But perhaps we can anyway find interesting discussion points.

Hi Peter,

Thanks very much for taking the time to comment so thoughtfully.

“If the incoming effect sizes have a very small error, then you of course need much fewer effect sizes than if they have big errors. ”

I recorded the sampling variances for every effect size. You need that information to do the meta-analyses that I did, of course. Presumably, variation in sampling error (among meta-analyses, among studies within meta-analyses, and among effect sizes within studies) is one source of variation in the results shown. But I don’t think the mere fact that effect sizes have sampling errors, and that those sampling errors sometimes are large, undermines anything I said in the post. But perhaps I’ve misunderstood your thinking on this? (if so, my apologies)

I have done further analyses, not shown in the post, focusing in on the effects of sampling error specifically. You certainly do see signals of sampling error in these data. For instance, sampling error (as opposed to heterogeneity) tends to be a bigger proportion of the total variation in effect size for smaller meta-analyses than for larger ones. But sampling error is only rarely the bulk of the variation. I recover the Senior et al. result that, for the typical ecological meta-analysis, about 80-85% of the variation in effect sizes is attributable to heterogeneity rather than to sampling error.

” Even though I find the global and broad research questions often very usefully, particularly to motivate further research, I must admit that I seldom trust the actual values coming out of them. ”

I think that’s wise of you! 🙂

“The more targeted metanalyses are different, where we actually have a possibility to quantitatively determine some parameter value of a relationship, but then the measures are often few enough so that we can evaluate the quality of each of them. ”

I was expecting to get this question, so I’m glad you brought it up. The idea, as I understand it, is that most of the heterogeneity in our meta-analyses would go away if only we restricted each meta-analysis to sufficiently-comparable studies. Which of course often will mean restricting each meta-analysis to just a few studies.

Something like this idea is also the motivation for looking for moderator variables, of course. Perhaps our estimate of the grand mean effect size is very statistically noisy, and not very ecologically meaningful or interpretable, because we’re lumping together studies conducted at different levels of one or more key moderator variables. So if we incorporate those moderator variables into our meta-analysis, we’ll get a very precise estimate of the mean effect size at any given level of the moderator variable(s).

My preliminary analyses of this dataset have made me skeptical of this idea, for several reasons. Here are my reasons, very curious to hear what you and others think of them.

1. The random effects meta-analyses I did break down total heterogeneity into two subcomponents: within-study heterogeneity, and among-study heterogeneity. On average, within-study heterogeneity is just as large as among-study heterogeneity. Now, those estimates of within- and among-study heterogeneity are just that–estimates. The estimates certainly vary a lot from one meta-analysis to the next–in some, the heterogeneity is estimated to be mostly within studies, in others it’s estimated to be mostly among studies, in others it’s more of a balanced mix. So as best one can tell from the available data, it is not usually the case that most heterogeneity is among studies rather than within studies. Even if you were to restrict your meta-analysis to only the effect sizes that came from a single study–which ought to be *very* comparable to one another–you’d *still* have a *lot* of heterogeneity among effect sizes in most cases.

2. On average, smaller meta-analyses don’t have more precise estimates of mean effect size than larger meta-analyses. Just the opposite in fact. And that’s even though many of these small meta-analyses are from the same meta-analysis paper. That is, the authors of the meta-analysis paper decided not to lump all their data into one unfocused meta-analysis, but instead reported several separate, focused meta-analyses in the same paper. I don’t see any sign in the data that those small, focused meta-analyses provide particularly precise estimates of mean effect size. Now, in some cases that could be because the precision that you gain by reducing heterogeneity is overwhelmed by sampling error. You’re cutting heterogeneity by doing a more focused meta-analysis, but at the cost of having only a few effect sizes and so having a lot of sampling error. It’s not clear to me why one would necessarily prefer a meta-analysis that’s noisy because of heterogeneity to one that’s noisy because of sampling error. Either way, your parameter estimates are noisy! But perhaps there might be reasons to prefer one source of imprecision to another?

3. Moderator variables in meta-analyses hardly ever seem to explain more than a tiny fraction of variation in effect size, even when that fraction is significantly >0. I didn’t compile data on this, but having looked briefly at a *lot* of meta-analyses, I don’t recall seeing *any* in which including moderator variables really did all that much to improve precision of estimated mean effect sizes. But maybe I’m forgetting some? So there’s a question for the crowd: name some meta-analyses in lots of the variation in effect size was explained by moderator variables. So that one could say with good precision what the mean effect size would be, conditional on the values of the moderator variables.

4. A personal anecdote. Many years ago, I did a laboratory microcosm experiment on trophic cascades. It involved protist predators, their protist prey, and bacteria as the basal trophic level. The species were purchased from biological supply companies and so lacked any coevolutionary history, they were growing under constant environmental conditions in an artificial environment without any other species present, etc. Quite different in all sorts of ways from any natural system. But yet, when I calculated a mean effect size to measure the strength of trophic cascades in my experiment, it was identical (to two decimal places!) to the mean strength of trophic cascades in field studies, from the meta-analysis of Shurin et al. I obviously wouldn’t say that this means my experiment was “realistic” or “natural” in all respects. Perhaps it was unrealistic in all sorts of ways that just so happened to cancel one another out and lead to a quite “realistic” strength of trophic cascades. But I have no idea what those respects are, and I don’t think anyone else does either. And that’s the point. If effect sizes are affected by all sorts of factors, *and we mostly don’t know what they are*, then we are in no position to pick out all and only those studies that are “truly” comparable, while omitting other studies from our meta-analysis.

For those four reasons, I doubt that ecologists actually *can* identify focused subsets of studies on effect X that are “truly” comparable. You can’t restrict yourself to apples-to-apples comparisons if you have no idea how to tell apples from oranges, or from pictures of apples, or from jars of applesauce. And restricting yourself to apples-to-apples comparisons doesn’t actually gain you anything if you only have a small sample size of apples, and if the apples you do have are almost as different from one another as apples are from oranges.

Wow,

that was a longer response than I anticipated, but great. So here comes a longer re-response than I expected. And perhaps I should moderate my last statement. You certainly find some interesting discussion points within this meta-meta-analysis.

You are of course perfectly right that you account for the variation in errors, but I cannot see how that is a problem for my point. When you do meta-analyses, you always have mixed bag of studies, those that are well-executed, accounting for various types of heterogeneities and with large sample sizes, and those that are not so well executed but that you include mainly because you want a larger sample size. I am fully aware that the former ones weight more heavily in the analyses, but I fail to see that the type of study added would not affect the number of studies needed. Having 5 extra of the well-executed studies would always carry more weight than having 5 extra of the more dubious studies. In that way, study quality will always affect the number of studies needed, and the thrustworthyness of the analysis.

An advantage with meta-analyses is that you can more easily get under the hood and see what the analyses are actually based upon, and it is not always an confidence-boosting exercise. Not seldom, I find questionable studies that probably should not have been included for various reasons. Of course, it may be that their inclusion actually did not affect the outcome of the analyses, but then they anyway could have been left out. The number studies included would perhaps look less impressive, but I think that I would prefer quality before quantity in that case. I realize that this comment is perhaps beside your arguments in the blog post, but I still would argue that it is not only a matter of sample size but also of study qualities.

On the other issue, of more targeted meta-analyses, I probably need to explain myself better. I have done quite a few meta-analyses myself, both broader and more targeted, and I am certainly more happy with some than with others. One of the first that I was involved was a broader analysis on the importance of trophic cascades. We thus collected, as you do, all studies that removed predators and that recorded effects on plant variables. When we did this, we were of course limited by the availability of studies. The study was published at a time when top-down effects were criticized as being unimportant, and we wanted to show that they were not. Our analysis mainly included fairly small scaled studies and today I would perhaps argue that this type of analyses are mainly qualitatively important and the actual value matters less. But I would also say that if we have had 10 times more studies of the same type, then our conclusions would not necessarily have become much stronger because we are limited by the scope of the original studies. But doing a publication bias here anyway would make some sense, as it is definitely more difficult to publish no-significant results. Without investigating much, I would say that most meta-analyses fall into this category.

However, I have also done more targeted meta-analyses testing specific and quantitative hypotheses where publication bias may be negligible. For instance, we wanted to know the exact scaling of migration rates and had a quantitative hypothesis that we wanted to test. In this case, missing values are not an issue. You have the measures that you have and they have a certain precision. There is little expected bias in publishing results that show a certain scaling and the number of included studies is therefore less of a question. For these type of studies, I would say that your analyses are irrelevant, as confidence can be achieved through a fairly small number of studies.

That for now. I probably need to go back to some original research to feed the literature with more data for meta-analyses.

I’ve just followed the original meta-analysis authors in their choices of what studies to include or not. So I don’t have any information on study quality, unless effect size sampling variance is considered a measure of study quality. So really, I’m doing a study of ecologists as well as ecology. The results I get are shaped not just by how the world is, but also by ecologists’ choices as to how to study the world. Such as “What studies should I include in my meta-analysis?”

Re: publication bias, I do plan to look at that with funnel plots and Eggers’ regressions. But my understanding is that those tools aren’t very powerful tests for publication bias. (There’s also the issue of, do you really want to look at 476 funnel plots, and how would you summarize what you found by looking at them?)

Re: your migration rate scaling example: without wanting to downplay the importance of such examples at all–it sounds like a very nice use of meta-analysis–such examples are of course rare. Most ecological meta-analyses don’t have quantitative hypotheses. One thing that I hope will come out of this project is that ecologists will be pushed to think harder about which meta-analyses are worth doing, and exactly why they’re worth doing.

Really interesting results and I really liked idea of using cumulative effect sizes for each study. I’m wondering if this could be used as a validation plot for individual meta-analyses, in addition to the funnel plots and other validation techniques commonly used.

But I wanted to add something to the discussion: I think it might be possible that ecological meta-analyses with many studies may in fact have smaller true effect sizes that those with fewer samples due to the choice of the research question. I think (without having looked at any data) that meta-analyses with a smaller sample size are likely to address more specific questions; as a hypothetical example, a metaanalysis about edge effects on animal biodiversity would have more studies that one about edge effects on insect biodiversity, but there would also be more variation among these studies. Thus, a meta-analysis addressing a more specific question would on one hand have fewer studies, but on the other hand it is likely that the results of these studies would be more similar among one another. Which might be sort of related to moderator variables 🙂

“Thus, a meta-analysis addressing a more specific question would on one hand have fewer studies, but on the other hand it is likely that the results of these studies would be more similar among one another. Which might be sort of related to moderator variables”

That is indeed one issue I want to address in the follow-up post!

I realize that Pavel stated this much better than I did. This is what I intended with more targeted meta-analyses.

Interesting but worrying results! In the analysis where you gradually increase the number of effect sizes, have you considered adding them in random order instead of in order of publication? You now confound the number of effect sizes with time, and effect sizes with respect to specific hypotheses have been known to vary systematically with time.

Simon Verhulst

Good question. In other analyses, I have looked for decline effects, and whatever you’d call the opposite of a decline effect (incline effect?) Decline effects, and their opposite, turn out to be rare in ecology. They may perhaps have been a bit more common back in the 1990s and 2000s. But these days, they’re not a thing in ecology.

Came here to make a similar suggestion. This is a cool dataset and analysis – and I agree with your interpretation that it shouldn’t be *that* surprising given how hierarchical models work.

I wasn’t wondering about decline effect specifically, but more that there is a historical legacy in sampling variance. Over time we gain better methods, better tools, better models, bigger datasets which should lead to declining sampling variance estimates in individual studies (ok, so maybe a variance decline effect? or maybe that’s just what drives the decline effect?). I think you can sort of see this in the CI width fig (though caveat that effects are often originally noted in the lit with low n, exploratory studies… so may not be about methodological improvement as much as the nature of how science proceeds).

It would be cool to randomize the order for the cumulative analysis. In fact, if you really wanted to get meta-meta-meta, you could do a kind of bootstrap – calculate something like convergence time (how many studies until you’re within some margin of asymptotic approach to the “final” value). And then repeat that n times with different randomization order. Then you’d have a sampling distribution of the number of studies to “convergence” – taking the quantile of that dist for the value of the “real” order would give some sense of how much of this is historical legacy, versus just a feature of relative “power’.

In my preliminary looks, there doesn’t seem to be a general trend for declining sampling variances over time. If it’s there, it’s *very* noisy and doesn’t explain much variation in effect size at all.

I’ve been doing permutation tests to look for decline effects. Your suggestion to look at time to “convergence” with a permutation test is interesting as well. It’s a little tricky to decide exactly what to permute. For instance, if you permute the years in which effect sizes were published, ignoring which studies those effect sizes came from, you also eliminate in expectation) among-study variation in effect size. One could of course instead permute the years in which studies were published (though that’s more of a pain to code up…)

Another thing one can address with a permutation test is coverage probabilities. For instance, do 95% confidence intervals for mean effect size, based on (say) effect sizes from the first 5 published studies, have a 95% chance of including the mean effect size calculated from the subsequent studies? My preliminary analyses suggest not–that “95%” c.i. for the mean effect size, based on just the first few studies, only have something like a 70% coverage probability for the final mean. But what I don’t yet know is whether the same would be true if you permuted the order in which studies were published. Is it that the first few published studies of any effect size tend to be unrepresentative of subsequent studies in some way? Or is that *any* small subset of studies on topic X is going to tend to be somewhat unrepresentative of other studies on topic X? (Or maybe it’s just that mean effect sizes aren’t normally distributed in ecology? They actually have some sort of heavy-tailed distribution or something?)

Right, permutation test, not bootstrap – duh…

That last bit is fascinating. I would think that a standardized effect size measure would asymptotically invoke CLT since most (all?) involve a sum – but maybe I’m off in that. Thus, if the CIs aren’t behaving as we would expect it could be that n is too low to effectively invoke the CLT and the sampling distributions are indeed skewed. Or could it be something like a signal of publication bias? Seems like your point about permuting the order could disentangle that.

Of course the 95% probability bit of the formal interpretation of a CI assumes future samples of the same size, so that *might* be a factor as well. Though as sample sizes generally increase over time (that’s a guess), I’d be surprised to see future estimates run outside the CI boundaries more frequently than expected simply because n increased.

This whole piece actually feels quite important though – if CIs are not behaving as we’d expect in ecology studies, it implies a whole slew of inferential problems.

My tentative thought is the same as yours–that we’re in a context where the sampling distribution is heavy tailed or something, and there’s a lot of heterogeneity, so the CLT needs a pretty big sample size to really kick in. But that’s a very tentative thought, I need to do more permutation tests to really understand what’s going on.

Jeremy, I’m looking forward to the post in response to the person who dismissed meta-analyses. That response, has always baffled me. It seems to me that there are only two possible beliefs that could lead to such a conclusion. One, that we should never try to synthesize in ecology. That is, every study should be viewed in isolation. So, it two researchers have studied the effects of zooplankton abundance on trout abundance in two different lakes, we should not try and bring the two studies together to draw any conclusions about the effects of zooplankton abundance on trout abundance.

Two, that a qualitative (usually opaque) synthesis of studies is better than a quantitative combining of studies.

Neither of those positions strike me as easy to defend. But maybe I’m missing something.

Jeff

I actually have mixed feelings about this. Mixed feelings that spring from the results summarized in the post and earlier in the comment thread.

On the one hand, I agree with you. It seems nihilistic–a counsel of despair–to say that we should view every study in isolation, or to say that only qualitative verbal synthesis has value. On the other hand, most ecological meta-analyses seem to provide pretty imprecise, heterogeneous parameter estimates. And we don’t seem to be very good at explaining that heterogeneity with moderator variables.

Those two thoughts would seem to put ecology between a rock and a hard place. What should we do about that? One tentative suggestion: put more collective effort into other ways of generalizing in ecology besides meta-analysis:

https://dynamicecology.wordpress.com/2015/06/17/the-five-roads-to-generality-in-ecology/

https://dynamicecology.wordpress.com/2019/11/04/poll-results-the-many-ways-ecologists-seek-generality/

Elizabeth Borer also has some ideas about distributed experiments as an alternative to meta-analyses:

https://dynamicecology.wordpress.com/2020/11/09/the-story-and-larger-lessons-of-the-nutnet-experiment-an-interview-with-elizabeth-borer/

One other thought, it makes more sense in ecological meta-analyses to think of them as estimating the ‘average’ effect size rather than trying to estimate the ‘true’ effect size. We know that averages lose information but they are also useful.

Hi Jeremy, Fascinating read. I’ll want to think about it more but focusing particularly on the first graphs and the result “Mean effect size estimates for larger meta-analyses cluster closer to one another (and to zero).” Doesn’t this show exactly what we might expect given the way science and ecologists work. If the first studies on a question show large effect sizes all in the same direction and an initial smaller meta analysis, with say 50 effect sizes, confirms this then there isn’t a lot of incentive to continue producing more studies measuring the same effect. These sort of questions will probably never generate 100 published effect sizes and as a result no larger meta analysis. If on the other hand the first studies show significant effects but clustering around zero or especially if they tip back and forth between postive and negative on an important question there will be much more work needed (and it will be much more publishable). The result would be the sort of pattern the graphs show without implying that if small meta-analysis were increased in size the effect size would decrease. Instead rather than happening by chance we’d have an expectation that ecologists will publish the most effect size estimates on phenomena that do indeed have small effects on average. I’m therefore thinking that the answer to the original question of how many effect sizes is enough, is indeed it depends, on how tractable the question is and how consistent the underlying phenomina are. Not an entirely satisfying answer but I suppose both reality and ecology aren’t as neat as we might like.

“If the first studies on a question show large effect sizes all in the same direction and an initial smaller meta analysis, with say 50 effect sizes, confirms this then there isn’t a lot of incentive to continue producing more studies measuring the same effect. These sort of questions will probably never generate 100 published effect sizes and as a result no larger meta analysis.”

Our compilation contains hardly any cases of multiple meta-analyses on the same topic. Not that we excluded them–there just aren’t any. Ecological meta-analyses very rarely follow-up on previous meta-analyses.

More broadly, just based on my own experience in ecology, I don’t think that the number of effect size estimates that get published on topic X really has much to do with how big the average effect size is. Ecologists’ choices about what to study depend on lots of things, not just the magnitude of the mean effect size.

Pingback: Why do so many ecologists overestimate how informative small meta-analyses are about the mean effect size? | Dynamic Ecology