As regular readers will know, my able research assistant Laura Costello and I have compiled a database of over 460 ecological meta-analyses. The database includes all the effect sizes, their sampling variances, and various other bits of information.
I’ve been sharing the fruits of my explorations of this database on the blog. Here’s my latest tidbit–a very striking feature of many (though far from all) ecological meta-analyses that I can’t make heads or tails of. Why the heck would many ecological meta-analyses have inverted funnel plots? So that there’s more variation among the most precise effect sizes than among the least precise ones?
Intrigued? Read on!
Background on funnel plots, and examples of “normal” funnel plots
You can skip this section if you know what a funnel plot is and how it’s normally supposed to look. But if you don’t know or need a reminder, here you go. A funnel plot is a scatterplot with the effect sizes (which summarize the study outcomes) on the x-axis, and some measure of the precision of those effect sizes on the y-axis. The standard error is probably the most common measure of precision, but sometimes some other measure such as sample size or sampling variance is plotted. It’s called a “funnel” plot because you’d expect the points to make a funnel shape. Effect sizes that were estimated very imprecisely (e.g., because they came from studies with small sample sizes) should differ a lot from one another–they should be widely scattered on the x-axis. Those points comprise the “mouth” of the “funnel”. But as you move along the y-axis, precision increases, so that the effect sizes should be much closer together along the x-axis. The funnel should narrow to a “point”.
Here’s an example, from the meta-analysis of selection gradients on floral traits in Caruso et al. 2019. The “observed outcomes” are the effect sizes; they’re regression slopes in this example.*
The vertical line is the grand mean effect size. The white funnel is +/- 1.96 SE around the mean; it’s a pseudo 95% confidence interval. Zero standard error is at the top of the y-axis; the y-axis runs from top to bottom. Notice that, the further down the y-axis you g, the more spread out the effect sizes are along the x-axis. That’s as you’d expect. If you conduct a bunch of imprecise studies, their results should differ a lot from one another. That’s what “imprecise” means!
Of course, not all funnel plots look like that. In particular, if there’s some bias against publishing certain studies, depending on the effect sizes they report, you might expect a funnel plot that looks like the one below. This is from one of the meta-analyses in Magris and Ban 2019 GEB (who report multiple meta-analyses of different response variables):
A funnel plot with that sort of skewed shape is consistent with publication bias against studies reporting large positive effect sizes (whether because journals reject such studies, or researchers don’t submit them for publication in the first place). Although note that skewed funnel plots can occur for other reasons. For instance, there might be some “moderator” variable that varies among studies, and that affects both the mean effect size, and the precision of the effect size. (Hold onto that thought, I’ll come back to it…)
Ok, so now that you’re up to speed on funnel plots, tell me what the heck you make of the funnel plot below. Because I’m stumped!
Inverted funnel plots (?!)
This is the funnel plot for Ferreira et al. 2015 Biol Rev, a meta-analysis of the effects of nutrient enrichment on litter decomposition in streams:
The funnel is inverted. Effect sizes with high standard errors–i.e. the most imprecise effect sizes–are relatively close to the grand mean, and thus relatively close to one another along the x-axis. Whereas the precise effect sizes are scattered all over creation along the x-axis. What in the name of
God Hedge’s d could be going on here?
And don’t answer “Oh, it’s just a statistical fluke, you’re going to see some weird stuff eventually if you look at 460+ funnel plots.” Because, first of all, there are a lot of effect sizes in that plot. So I highly doubt that plot looks weird just because of a fluke of sampling error. And second of all, as you’ll see in a second, there are far too many meta-analyses with funnel plots that look like this for it to be a fluke.
And don’t answer “Oh, either you, or Ferreira et al,. must’ve messed up somehow”. Because, again, Ferreira et al. is far from the only ecological meta-analysis with an inverted funnel plot. Now, inverted funnels aren’t the majority. But in my compilation, they’re about as common as “normal” funnels, just going by my eyeball evaluations of all 460+ funnel plots.
Here are a few more examples of inverted funnels (there are many more that I’m not showing you). Vidal and Murphy 2018 EcoLetts:
Møller and Mousseau 2015 Sci Rep:
Now, obviously one thing I’m going to do is double check to make sure that I didn’t make some repeated mistake in data entry. But assuming that I didn’t totally screw up the data entry, do you have any idea what could be going on here?
My first thought was some sort of publication bias. Maybe there are some topics for which it’s hard to publish imprecise effect sizes, unless those imprecise effect sizes are close to zero? I feel slightly ashamed just typing that last sentence, because it is such a bizarre idea. It just seems totally implausible to me. It’s the Underpants Gnome theory of inverted funnel plots.
My second thought was that it’s something to do with heterogeneity. “Heterogeneity” in this context is a technical term; it refers to variation within and among studies in the true mean effect size. In ecological meta-analyses, heterogeneity is usually quite high–a much bigger source of variation than random sampling error (Senior et al. 2016). Different estimates of the “same” ecological effect don’t report different effect sizes just because of sampling error. Rather, the effect sizes will also differ because the effect sizes were estimated (say) on different species, at different locations, using different methods, etc. Heterogeneity is one of the main reasons why effect sizes in ecological meta-analyses often fall outside the pseudo-95% confidence interval cones in the plots above. Those cones are drawn on the assumption that there’s no heterogeneity. But just saying “there’s heterogeneity” doesn’t explain why so many funnel plots would have an inverted shape–wide at the top, narrow at the bottom. After all, imprecise studies presumably exhibit within- and among-study heterogeneity too! In order for heterogeneity to produce an inverted funnel plot, I think it would have to be the case that there’s lots of heterogeneity among precise effect sizes, but not much heterogeneity among imprecise effect sizes. (Right?) I can dream up not-totally-implausible scenarios in which that might be the case. You’d need some sort of moderator variable, probably methodological, that affects both among-study heterogeneity and precision in just the right way.** But to figure out if any of those hypothetical scenarios explained any of the inverted funnel plots I’ve found, I’d have to do a deep dive into each meta-analysis with an inverted funnel plot. And probably into the studies that meta-analysis compiled. Before I start doing those deep dives, I’d like to have some idea of what I might be looking for.
My third thought was that it might have something to do with the fact that, for some measures of effect size, there’s a mathematical relationship between the mean and the standard error. But I don’t think that explains the inverted funnels, because it’s not the case that all and only those meta-analyses that use (say) log response ratio as the effect size exhibit inverted funnel plots.
So, got any ideas? Now’s your chance to look smart by pointing out that I’ve overlooked some obvious explanation! In all honesty, I will be super-happy if you do that, because I will have learned something. 🙂
*Throughout this post, I’m the one who made the funnel plots you’re seeing. For each meta-analysis, I first fit a hierarchical random effects model estimating variation in effect size among primary research papers, and among effect sizes reported in the same primary research papers (and sampling error, obviously). I did this using the metafor package in R. Then I used the funnel() command in metafor to produce a funnel plot, using the default settings.
**For instance, say there are lots of experimental studies of some effect. They’re all precise, because they all have large sample sizes, are conducted under controlled conditions facilitating precise measurements with sensitive equipment, etc. But they’re conducted by different research groups on different species under different (controlled) conditions from one experiment to the next, so there’s lots of heterogeneity. There are also some observational studies of the same effect. They’re all imprecise, because they have small sample sizes. But they’re not all that heterogeneous, because they’re all conducted by the same research group on the same species in nearby locations. I’m making that story up off the top of my head, I have no idea if that’s actually what’s going on in any real meta-analysis. But it’s the kind of thing that would have to be going on. I think? You tell me!
Could it simply be sampling of estimates with diferent levels of precision? In all of your examples, there are very few estimates with low precision. Looks to me like they may be drawn randomly from the distribution of more precise estimates. This doesn’t explain the spread of the more precise estimates (heterogenity seems the likely explanation for those), but seems like too few estimates that are imprecise are present in these samples. A simulation might sort that out.
Yes, that’s today’s project: simulate some data and see if I can reproduce inverted funnel plots.
I can’t solve your mystery, Jeremy, but I want to record my objection to the terminology (which I realize is not yours!) People must use funnels in very, very peculiar ways; because a “normal” funnel should have the point down, and an “inverted” funnel should have the point up. So perhaps what you’ve discovered is that ecologists, unlike other scientists, know how to use funnels? 🙂
Now I want a metaanalysis of studies using Berlese funnels so we can have a funnel funnel plot. And what if we were using those Belese funnels to extract insect pests of ship-stored funnel cakes? Then we’d have a funnel funnel funnel plot.
OK, OK, I will stop now.
Could it just be due to sample size? If you have lots of samples near standard error = 0, you are going to get a high absolute number of large deviations, even if the variance doesn’t change. I’m not sure if that would explain the entire distribution, but it might be a part of the story. You could quickly check it by categorizing the effect sizes according to their SEs and making a plot of means and quantile ranges. If the quantile ranges still show an inverse funnel, then it’s not due to sample size.
For example, the points in the figure behind this link (https://imgur.com/QsmE9n1) all come from distributions with sd=1, but visually show an inverse funnel nonetheless.
That could be part of it, yes. Good suggestion.
I was going to make a similar point to Konsta, but also say that some of these are *barely* funnels in any sense (verb or noun Dr Heard) and that by having the expected funnel superimposed on the graph you set up an expectation as to what it *ought* to look like. So is there any objective statistical measure of *funnelness* that could be used to test these plots?
You mention simulations, but a zero-assumption check on whether this is an artifact of data-density as Konsta suggests is to throw out points to equalize sampling within different SE strata. For what it’s worth, I predict you are going to lose any inverted funnel shape if you equalize sampling, but there is still a puzzle why higher powered studies are not coming to a point as expected (sub-sampling is not going to give you the funnel you expected).
How confident are you that you have placed the point of the funnel at the correct place vertically? You mention these are not all actually SE; I could imagine a case where even your highest powered study was still a pretty poor estimate, so all points, even your best ones are still in the wide body of the funnel. I guess if the focal metric was context dependent, you could have several great studies with very low SE but widely different estimates (because true metric is different in each studied location)…
I came here to make the same point as Konsta; glad to see it was made already. One way you could test for this is to run a quantile regression of outcome effect size on standard error, to see if the 25% and 75% quantiles show the inverted funnel shape. If this is just a sampling artifact, then you should have basically straight lines (assuming that the actual variation among effect sizes is constant).
It could arise from pseudo-replication, I think.
Suppose some studies are actually pseudo-replicated, then their SD and SE will be quite small — they’d measure the same thing over and over again and hence (falsely) appear quite precise. Some such studies can find a big negative effect, some are near zero, others find a big positive effect. So, they spread out over the x-axis, but not on the y-axis.
Stuart Hurlbert might be interested in what Jeremy has dug up.
I agree with this. I would go one step further than “pseudo-replication”, to say that it could more generally be due to “pseudo-precision”. In other words, standard errors being smaller than they should be, and not converging to the true value (i.e. reflecting bias). That could be caused by more than just pseudoreplication – also caused by spatial/temporal autocorrelation, violated modeling assumptions, etc. Not sure how that explains the bottom part of the graph, though, but it seems relevant to the broad distribution at the top of these graphs.
This was my suspicion too: that the often non-independent relationships among observations have been neglected at some level. This inflates the estimated degrees of freedom and underestimates uncertainty. Such “pseudo-replication” has many guises and depends on the specifics of the questions being asked including the level of generalization sought. Though common within individual ecological studies, pseudo-replication can arise at higher levels of generalization too and thus need not be the fault of the original studies examined here: each may have been designed to address narrower questions than asked in the meta analysis.
Not a complete solution, but I expect part of the story is that people feel comfortable presenting (i) low precision but plausible estimates (close to the grand mean), or (ii) high precision estimates regardless of plausibility. Which means that low precision but implausible estimates are file-drawered out of the literature.
That’s certainly more plausible than my version of that idea. Though I’m still not sure how plausible it is in an absolute sense. No data to back this up, but my gut instinct is that the file drawer problem isn’t actually a big deal. Not that there aren’t studies that go unpublished–there are. But I just question whether they’re all *that* biased a subset of all studies with respect to their effect size estimates and standard errors. My gut’s been wrong before, of course!
I wonder if it could be an illusion caused by a few very large SE estimates. For example, it looks like if you log transform the SE values in the first plot, you might see a proper funnel. It looks like there might actually be a (correctly shaped) funnel in that first plot, but the scale of the plot is distorted by a few big outliers, making the funnel shape disappear. Hard to say, but worth looking at. As for why those large SE’s tend to occur around zero, I don’t know.