I’m guessing that most readers of this blog will be familiar with the concept of shrinkage estimation. But if not, here’s an example to give you the idea. Imagine you’re trying to estimate the true winning percentage of each team in a professional soccer league–the percentage of games each team would win if, hypothetically, it played each of the other teams many, many times. But it’s early in the season, and so each team has only played each of the other teams once. You could take each team’s observed winning percentage as an estimate of its unknown true winning percentage. But those estimates come from small samples of games, and the better team doesn’t necessarily win every game because chance events play a role. So observed winning percentages after just a few games are imprecise estimates of those unknown true winning percentages. Meaning that the variance among teams in their observed winning percentages surely overestimates the variance among teams in their unknown true winning percentages. In all likelihood, the team with the highest observed winning percentage so far is not only good, it’s also gotten lucky. It’s good, but likely not *as* good as its observed winning percentage suggests. And in all likelihood, the team with the lowest observed winning percentage so far is not only bad, it’s also gotten unlucky. It’s bad, but likely not *as* bad as its observed winning percentage suggests. Put another way, as the teams play more games, they’re likely to regress to the mean. So in the aggregate, you can improve your estimates of the teams’ true winning percentages if you shrink the observed winning percentages towards 50% (the average winning percentage). You make the bias-variance trade-off work in your favor by biasing all of your estimates towards the mean, in order to reduce their variance. There are ways to work out exactly how much shrinkage is optimal.

I think we need shrinkage estimation for mean effect sizes in ecological meta-analyses. That is, I think many ecological meta-analyses provide very imprecise estimates of the unknown “true” mean effect size. So that, in aggregate, those estimated mean effect sizes would be improved if they were shrunk towards the mean. Here, see for yourself:

The x-axis of that figure shows the mean effect sizes from every meta-analysis in my pretty-comprehensive compilation of over 460 ecological meta-analyses. The y-axis shows their standard errors.* Notice the funnel shape. Precisely estimated mean effect sizes (so, low on the y-axis) are small in magnitude; they’re clustered near zero on the x-axis. The funnel gets wider as you go up the y-axis. Imprecisely estimated mean effect sizes vary from massively negative to massively positive. And as you’d expect, the imprecisely estimated mean effect sizes come from small meta-analyses (i.e. those based on few primary research papers):

Figure 2 shows that the mean effect sizes from small meta-analyses are all over the map, whereas those from large meta-analyses are clustered much closer to zero. Which indicates that, if ecologists were to conduct additional studies on the topics of the small meta-analyses, the mean effect sizes would tend to shrink towards zero. Of course, we can’t actually go and conduct many hundreds of additional studies on every topic on which ecologists have already published a meta-analysis. At least, we can’t do so very quickly or easily. But surely there’s some statistical way to estimate what would happen if we did. Surely there’s some statistical “shrink ray” we could use to shrink all these meta-analytic means towards zero by some appropriate amount.**

But how, exactly? The most obvious way, at least to me, is via a hierarchical random effects meta-*meta-*analysis. That is, take all effect sizes from all ecological meta-analyses that use (say) the log-transformed response ratio as the effect size measure, and throw them all into a massive hierarchical random effects model estimating variation in effect size among meta-analyses, among primary studies within meta-analyses, and among effect sizes within primary studies. The hierarchical random effects structure of the model will shrink estimated meta-analytic mean effect sizes towards the grand mean.

Or rather, it would if you could actually fit the model, but you can’t. At least, I can’t. I tried to do it on my reasonably new and powerful laptop, and R gave me a hilarious error message I’ve never seen before. Something about being unable to allocate a 92 Gb vector or something. 😛 Apparently, you can’t fit a hierarchical random effects model involving many tens of thousands of effect sizes from thousands of primary research papers, at least not when the effect sizes and their sampling variances vary over such huge ranges.

My googling has not turned up any workable alternative solutions. If fitting the model with REML is computationally infeasible, it’s sure as hell not going to be feasible with MCMC. Is there some other approach I’m unaware of, that would make it computationally feasible to fit this model? (empirical Bayes? some other Bayesian approach?) But the only other computationally feasible approaches that occur to me are all quite ad hoc, and mostly involve throwing away a lot of data.

For instance, one thing I’ve done is use actual linear regressions to quantify regression to the mean in the small subset of ecological meta-analyses that include 100+ primary research papers. For those meta-analyses, you can work out the estimated mean effect size based only on the data from the first 100 primary research papers included in the meta-analysis (i.e. the first 100 to be published). Do the same thing for the first X primary research papers, where X is some integer <100. Then regress mean effect size after publication of the first 100 papers on mean effect size after publication of the first X papers. The slope of the regression will be <1 due to regression to the mean, just like in the parent-offspring regressions that led Galton to coin the term “regression”. The flatter the regression line is, the more regression to the mean there is. As you’d expect, the smaller X is, the more regression to the mean there is.*** One could use those estimated regression lines to shrink all the mean effect size estimates from all the meta-analyses. But those regression lines are estimated from only a small subset of all ecological meta-analyses. And you’d be assuming that all meta-analyses involving the same number of primary research studies should be regressed to the mean using the same regression line, which is a pretty crude assumption.

Another ad hoc approach that involves throwing away a lot of data: fit a hierarchical random effects meta-meta-analysis to only a randomly chosen subset of meta-analyses. Repeat with lots of different randomly chosen subsets. Somehow average together the answers.

If anyone has any ideas or pointers, put them in the comments or drop me a line (jefox@ucalgary.ca). I’m all ears!

*Both sets of numbers are from hierarchical random effects meta-analyses estimating variation in effect size due to variation among primary research papers, variation among effect sizes within research papers, and sampling error.

**The appropriate amount of shrinkage would of course vary among meta-analyses, for various reasons. For instance, different meta-analyses include different numbers of effect sizes from different numbers of primary research papers, and those effect sizes have different sampling variances.

***And of course, there could be further regression to the mean even beyond 100 primary research papers. But it’s hard to get at that possibility, because so few ecological meta-analyses include >100 primary research papers.

Beat bet you didn’t mention is R-INLA. Bayesian using a clever numerical approximation and sparse matrices under the hood. This is the workhorse for big spatial epi modeling. It should need a lot less than 92 GB because of sparsity. Lots of good tutorials (one close to your problem I think https://www.paulamoraga.com/book-geospatial/sec-inla.html), and the Google group is amazing. Clear questions are likely to get answered quickly by the maintainers.

A couple other tips. Always run verbose=TRUE to see that it’s doing something. Also, start with strategy=gaussian (and maybe subset data) to see if it’ll run at all. If everything works that way, you can go to strategy=simplified.laplace for best accuracy/speed trade-off. There’s a bit of a learning curve early on if you want to specify your own priors and extract non-standard pieces of the model, but INLA is a super power.

Thanks!

The other suggestion I’ve received (via email) is to just run the model on a big cluster. I do have access to computing clusters at my uni (or via my uni).

Wonder if the “lasso”, a general approach to shrinkage/regularization that differs computationally from random-effects would be useful.

Hmm, interesting suggestion. I know the basic idea of the lasso, but don’t know any details. I’d need to learn more about it and figure out if it’s ever been applied in a meta-analytical context.

Wait so how many studies/effect sizes are we talking? If I’m understanding you correctly, it’s

effect[i] ~ 1 + (1| study) + (1|study/meta)

Even for thousands of studies it feels like this model could be fit!

We’re talking many tens of thousands of effect sizes nested within thousands of primary research studies, nested within hundreds of meta-analyses.

It sounds very cool. I think this random effects model has to be fit. Surely there’s a way… a supercomputer somewhere? Surely if students get access to those to practise fitting phylogenetic trees you can find a way for a grand meta-meta-analysis like this.

Yes, I’ve gotten some promising suggestions in this thread and via email for fitting the model. Could try using a computer cluster to get around the memory limitations of my laptop, or else fit the model in some more memory-efficient way.

Update: I’m making progress without resorting to a computing cluster, thanks to a tip from a correspondent plus a bit of googling. Setting “sparse=T” in the rma.mv() function I’m using to fit the meta-meta-analysis helps a lot. And since I’m running 64-bit R on a 64-bit Windows machine, I can use memory.limit() to jack the memory allocation to R way up.

Update: it also can help to save your workspace image, then restart R. Turns out it doesn’t help quite enough in my case, though. To make further progress, I’m going to need to move to a workstation or computing cluster with more RAM. My 16 GB laptop doesn’t quite cut it.

Success! At least, for the 116 meta-analyses in my compilation that use the log-transformed response ratio as the effect size measure. Big thank you to my colleague who allowed me to borrow her Mac workstation with 64 GB of RAM.