The latest many analysts, one dataset project is out as an unreviewed preprint, and this one has the most depressing conclusions of the ones I’ve seen. Breznau et al. gave 73 social science research teams the same dataset, asking them to estimate the effect of immigration on public support for social policies. The point estimates were centered on zero but varied massively. A substantial minority of research teams reported a statistically significant negative effect, and a different, almost equally-big substantial minority reported a statistically significant positive effect. All this is in contrast to previous many analysts, one dataset projects I’ve seen, in which most or all analysts at least agreed about the sign of the effect of interest.
The variation among analysts in the Breznau et al. study is because there were a lot of analytical choices that varied among research teams, no one of which has a big “main effect” on the outcome of the analysis. So it’s not that, say, omitting one specific influential observation always reduces the estimated effect of immigration by some massive amount, independent of your other analytical choices.
On the plus side, at least you couldn’t explain much of the among-analyst variation from knowledge of analysts’ prior beliefs about the topic. Because it would be really depressing to live in a world in which every analyst was consciously or subconsciously putting a thumb on the scale, to reach the conclusions they already “knew” to be true.
Nor could you explain much of the among-analyst variation from knowledge of analysts’ methodological expertise. I find that result both unsurprising and reassuring. But I’m curious if any of you are either surprised or bothered by it.
Question: how exactly would you phrase the take-home message here regarding effects of immigration on public support for social policies? Would you say that the dataset is “uninformative” about the effect of immigration? That there’s “no evidence” for an effect? “Mixed evidence”? “Conflicting evidence”? Strong evidence against an effect of immigration? Or what? I’m torn.
Another question: I wonder if at some point we’ll have enough of these many analysts, one dataset projects to do some comparative analyses of them? If we ever do, here’s my pet hypothesis as to what we’ll find: the among-analyst variability will be highest in cases when the mean estimate (averaging over all analysts) is close to zero. My other pet hypothesis is that among-analyst variance typically will be as large or larger than among-dataset variance. That is, if you gave the same analysts two different datasets addressing the same question, I bet there’d usually be more variation in results among analysts than among datasets. (The analyst x dataset interaction term might be big too.)