What conclusions should you draw from a dataset when different analysts reach different conclusions?

The latest many analysts, one dataset project is out as an unreviewed preprint, and this one has the most depressing conclusions of the ones I’ve seen. Breznau et al. gave 73 social science research teams the same dataset, asking them to estimate the effect of immigration on public support for social policies. The point estimates were centered on zero but varied massively. A substantial minority of research teams reported a statistically significant negative effect, and a different, almost equally-big substantial minority reported a statistically significant positive effect. All this is in contrast to previous many analysts, one dataset projects I’ve seen, in which most or all analysts at least agreed about the sign of the effect of interest.

The variation among analysts in the Breznau et al. study is because there were a lot of analytical choices that varied among research teams, no one of which has a big “main effect” on the outcome of the analysis. So it’s not that, say, omitting one specific influential observation always reduces the estimated effect of immigration by some massive amount, independent of your other analytical choices.

On the plus side, at least you couldn’t explain much of the among-analyst variation from knowledge of analysts’ prior beliefs about the topic. Because it would be really depressing to live in a world in which every analyst was consciously or subconsciously putting a thumb on the scale, to reach the conclusions they already “knew” to be true.

Nor could you explain much of the among-analyst variation from knowledge of analysts’ methodological expertise. I find that result both unsurprising and reassuring. But I’m curious if any of you are either surprised or bothered by it.

Question: how exactly would you phrase the take-home message here regarding effects of immigration on public support for social policies? Would you say that the dataset is “uninformative” about the effect of immigration? That there’s “no evidence” for an effect? “Mixed evidence”? “Conflicting evidence”? Strong evidence against an effect of immigration? Or what? I’m torn.

Another question: I wonder if at some point we’ll have enough of these many analysts, one dataset projects to do some comparative analyses of them? If we ever do, here’s my pet hypothesis as to what we’ll find: the among-analyst variability will be highest in cases when the mean estimate (averaging over all analysts) is close to zero. My other pet hypothesis is that among-analyst variance typically will be as large or larger than among-dataset variance. That is, if you gave the same analysts two different datasets addressing the same question, I bet there’d usually be more variation in results among analysts than among datasets. (The analyst x dataset interaction term might be big too.)

13 thoughts on “What conclusions should you draw from a dataset when different analysts reach different conclusions?

  1. Hi Jeremy, I’ll admit up-front that I only skimmed the preprint and am not familiar with this type of research. But I have to wonder what the value of this type of study is if we don’t actually know what the right outcome should be?

    My gut-feeling is that when you have a messy dataset of any complex system with multicausality, there is always the risk of over-fitting a model to random noise or missing information. Do you know of any similar studies where researchers were asked to analyse simulated datasets, where the level of noise in the data was adjusted? My prediction would be that the amongst-analyst variation would be proportional to the random noise in the dataset as well as the the number of causal variables in the system.

    So, my conclusion from the study would be that public support for social policies is multiclausal and although immigration is likely one of the causes, it is not so strong that it can overcome the effects of other causes on its own. It’s a bit of a cop-out, I know…

    I hope Brian will share his views, because I was reminded of this post: https://dynamicecology.wordpress.com/2016/03/02/why-ecology-is-hard-and-fun-multicausality/

    • I was going to go with Jeremy’s option that the dataset is “uninformative about the effect of immigration”. Would you say, given what you’ve said here, that that is your opinion too? When there is strong multi-causality, all sorts of potential confounds in the data, or just noisy/messy data in general, that seems like a good reason for saying the dataset is just not up to the task of answering the question.

      • I’m not sure I would say that, no. I don’t think this study tells us much about the limits of the dataset. Instead, it probably tell us more about the intractability of the research question.

        I suppose there would be ways to collect a ‘more informative’ dataset by controlling for other confounding variables. Such a dataset might even result in a consistent statistical effect. But would we necessarily understand the system better by eliminating other causes? I’m not so sure.

        In an ecological context, it’s like setting up a controlled experiment that has a statistically significant effect, but only explains 5% of the variation. Sure, we would know about the specific treatment, but we’d still be ignorant of what causes the other 95% of the variation in the dependent variable.

    • “But I have to wonder what the value of this type of study is if we don’t actually know what the right outcome should be?”

      Can’t we ask the same question of any study not conducted on simulated data? I mean, it’s not as if we ordinarily ever know the “true” population parameters. Ok, I’m sort of kidding here–but only sort of.

      Personally, I’d say these many analysts, one dataset projects are useful because they demonstrate the importance of what Andrew Gelman calls “researcher degrees of freedom”. The demonstration is effective no matter what the (unknown) “right” answer is. Whatever the “right” answer is (or even if you insist there is no one “right” answer), the amount of variation among analysts is sobering. I bet that, if you’d polled ecologists even just a few years ago, most of them would not have believed that there’d be *that* much variation in conclusions among analysts working on the same dataset. (in retrospect, I wish I’d done that poll!)

      I think it could be interesting to do a many analysts, one dataset study with some random noise added to the data. I might go one step further and give a bunch of analysts a simulated dataset that’s *nothing but* random noise. (Obviously, you’d have to tell the analysts it’s real data, not simulated). See how many analysts find a “significant” result. That would be a way to estimate the “background” rate of overfitting among data analysts.

      Conversely, I think it could also be interesting to give many analysts a simulated dataset (that they’re told is real) with some known signal in it, and see how many analysts recover that known signal. In fact, that’s been done in physics: https://dynamicecology.wordpress.com/2013/03/15/bonus-friday-link-a-real-use-for-fake-data/

      Thanks for linking to that old post of Brian’s. That was in the back of my mind as I wrote this post! One could imagine a follow-up post, “Why data analysis is hard and fun: multicausality of analytical outcomes”.

  2. Provocative post, Jeremy! My immediate gut reaction to Breznau et al.’s findings were: a) This is social science, right?, and b) It’s not possible “to estimate the effect of immigration on public support for social policies.”

    I know from many of your prior posts, with which I agree, you’ve cast serious doubts upon the ability of social science to quantify much of anything. Although I suppose qualitative assessments (ones & zeros) are attainable under certain conditions. As such, I believe one reason why there is so much variability observed among these analysts arises from the inherent flaws of social science, and not necessarily any shortcomings of the analysts.

    The research question handed to these analysts seems so nebulous that a Ouiji board might render better results than any statistical set of equations. I live very near the Mexico/US border, in New Mexico. While I have never been pleased with the onslaught of unlawful immigration, drug running & sex trafficking coming into the US- I have not once linked my support (or lack thereof) for social policies to immigration.

    Perhaps I am mistaken, but in my mind I do not equate immigration with social policies. I see them as distinct entities. I say as much, because any person with a lawful status in the US can potentially access any variety of social programs, while those here illegally or on holiday cannot. Ergo, my support of child welfare, disability payments, unemployment benefits, gay rights, maternity leave & so on have nothing to do with immigration.

    Does that make sense? If so, it might explain why there is no consensus on the original research question, because the question itself seems nonsensical.

      • Fair enuf- my apologies for the distraction. Would you expect a different outcome from differing branches of science?

  3. I was hoping that Hannah Fraser would comment, as she’s among those leading a very large many analysts, one dataset project in ecology:

  4. Jeremy, I was thinking you could leverage this blog to produce a publication quality analysis of the many analysts, one dataset, problem. How many loyal readers are faculty who teach an advanced undergraduate or graduate course in ecology in which statistical data analysis is a core element? 10 or more? How many students per class? 10 to 30? So even with attrition and opt outs, there would be a source of 100+ analyses of the same dataset.
    On reading your post, I was taken back to a take home exam many years ago when the professor’s instructions were simply “Analyze Fisher’s iris data” accompanied by a dataset. I recall amazement when she went through the completed assignments at the variety of decisions and paths we had taken – multivariate, univariate, transformations, correlation, hypothesis testing, which post hoc test or corrections, and on. My lasting impression was her observation that while some choices were poorer than others for the nature of the data, few of the assignments went off the rails, even though the analyses and choices were very divergent. And we students didn’t know enough to go nearly as far afield as contemporary practicing ecologists.
    Picking a reasonable dataset(s) would be key. It would be a cool comparison with Hannah Fraser’s analysis.

    • Great minds think alike. 🙂 I already thought of doing that, though not with biostats students. https://dynamicecology.wordpress.com/2019/11/19/whos-up-for-a-study-of-researcher-degrees-of-freedom-in-ecology/ But as you noted, Hannah Fraser and her colleagues are already doing it, so I stepped aside. They’re well down the road now. IIRC I think they have 2 datasets, each with over 200 analysts? Looking forward to seeing what they come up with.

      I use a version of the same class exercise in my own graduate biostats course. I use a stripped down version of a dataset that was compiled in a recent Nature paper, to estimate the per-dollar effectiveness of conservation spending. I have the students work in pairs to estimate that quantity. The estimates tend to vary widely, and often have the opposite sign to what you’d expect. That exercise has made me rather dubious about the conclusions of that Nature paper. The conclusions seem to be very sensitive to lots of more-or-less arbitrary analytical choices.

  5. Pingback: The death knell for the “one right way” approach to statistics? and the need for Sturdy Statistics | Dynamic Ecology

  6. To which: sorry, not sure what happened. I checked the spam folder and the queue of comments awaiting moderation, and it’s not in either place. Can you try again?

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.