Marginal Revolution points us to the latest on researcher degrees of freedom (unreviewed preprint): give 70 different teams of neuroscientists the same fMRI data and tell them to test the same scientific hypotheses, and they give you back widely-varying answers because they make different choices at all stages of data processing and analysis. (though not so varying that there’s no commonality among the answers.) A previous study of whether soccer referees are more likely to give red cards to dark-skinned-toned players also found that different analysts reached widely-varying answers, though not so varying as to lack any commonalities. Both studies also did some interesting supplementary analyses to try to explain the among-researcher variation.
I’m increasingly curious what you’d find if you did a similar exercise in ecology. We know from survey data that various questionable research practices, some of which fall under the heading of “researcher degrees of freedom”, are common in ecology and evolution (Fraser et al. 2018). But we don’t know exactly how big their consequences typically are for statistical analyses. One way to find out is to give a bunch of ecologists the same dataset, ask them to address some clearly-stated scientific claim, and see how much among-ecologist variation there is in the answers. Note as well that “researcher degrees of freedom” aren’t always bad. After all, we don’t want everyone doing mindless cookbook statistics. The point of quantifying the effects of “researcher degrees of freedom” in ecology is to learn more about how ecologists exercise their professional statistical judgment, not to criticize any and all exercises of statistical judgment. Judgment calls can never be eliminated from statistics.
I’m curious enough about this that I’m seriously considering doing it! But if I’m going to do it, I want to do it well, and so I think it’d be best to do it collaboratively. So, who’s with me? If you’re interested in participating in this, drop me a line (jefox@ucalgary.ca), or leave a note in the comments.
If we’re going to do this, one of the first things we’d have to do would be to identify/compile a good dataset, and a good scientific question to be addressed with that dataset. The question would need to be an interesting/important scientific question, I think, because that makes the exercise worthwhile scientifically, as well as as a study of ecologists’ statistical decision-making. And the dataset would need to be a reasonably large, rich dataset, I think. In part because you want a dataset that’s capable of addressing the scientific question of interest reasonably well. And in part because, if you just give people a very simple dataset with, say, an X and Y variable and ask for the best estimate of the slope of Y on X, you’re not leaving much scope for researcher degrees of freedom.
It couldn’t be a dataset that’s already been published and analyzed, I don’t think. Not even one that’s already been analyzed in different ways by opposing camps of researchers. Researchers’ analytical choices would be influenced by having read others’ analyses.
It would be interesting to come up with several datasets and associated questions. Some of the questions and datasets should concern issues on which there’s considerable pre-existing controversy in the ecological literature, and others should concern issues on which there’s little or no controversy. Then you could ask: do “researcher degrees of freedom” tend to loom especially large in analyses of controversial scientific questions? Asking that question is one way this study could go beyond just showing once again that, yeah, “researcher degrees of freedom” is a thing. But if you were going to do this, you’d have to make sure you had a way to measure the variance among researchers’ answers that was comparable across questions and datasets.
I also think it would be interesting to first poll researchers on what answers they expect to obtain. Then look for a correlation between the answers they expected to obtain and the answers they actually obtained.
I could imagine that this exercise might be a very interesting add-on to the work of one or more ongoing or planned working groups.
And if you wanted to go really meta, you could let different researchers independently analyze the results of this exercise!*
Whether or not you’re interested in participating in this, feel free to share any thoughts you have about this project in the comments.
UPDATE: In the comments, Shan Kothari points out that I’m not the first ecologist to have this idea (I’m not surprised; thanks to Shan for passing on this info). So the first thing I’ll do is get in touch with the folks who already had this idea.
UPDATE #2: Updating again to confirm that Hannah Fraser, Tim Parker, and their colleagues are still moving forward with this project. They had the idea first and they’re already well down the road, so it wouldn’t make much sense for me to organize a parallel effort, I don’t think. Glad to hear somebody’s already doing this, looking forward to seeing what comes out of it.
UPDATE #3: Scroll down to the comment from Hannah Fraser for instructions on how to sign up as an analyst for this project.
*Kidding. Maybe.
Some ecologists had tweeted their intention to do what sounded like a researcher degrees of freedom study here: https://twitter.com/HannahSFraser/status/1067963607084867584
I signed up, but I haven’t heard much since.
Thanks for the tip! I was wondering if anyone had already decided to do this.
I think this would be fun to do (although it might be less shocking to ecologists than social scientists that there are researcher degrees of freedom – but still good to get it in the literature).
My thought for a dataset would be something nutnet like. So spatially dispersed replicates, some covariates for the replicates (temperature, precip, richness?) and lots of questions about how to model the fact that these are not perfect replicates but partial replicates. Although nutnet type data are still rare (unfortunately) this parallels the issues decisions ecologists face with blocking all the time.
And if you really want to get researcher degrees of freedom showing up make it multivariate (e.g. species composition as the explanatory or dependent variable).
Yeah, agree that something along those lines would be a good candidate dataset/question.
Another good candidate would be something like that soccer red card study: you want to precisely and accurately estimate the marginal effect of X on Y, but there are a lot of other variables that affect Y, some of which are likely to be confounded with X. I did one like that with my graduate biostats course this term, using a simplified version of a dataset from a Nature paper from a few years ago. The paper estimated the marginal effect of an additional dollar of conservation spending, and how that marginal effect varied with various other factors. The goal was to identify countries in which additional conservation spending would yield the greatest “bang for the buck”. Have to say that going through that lab exercise left me rather less confident in that Nature paper’s conclusions than I would’ve been had I just restricted myself to reading the paper. Even though I’m someone who was already keenly aware of the general issue of researcher degrees of freedom, it was surprising and sobering to see just how much the answer changed with even slight changes to the statistical analysis. It was much easier to come up with plausible-seeming models that estimated a marginal effect of the wrong sign (i.e. additional conservation spending worsens conservation outcomes, all else being equal) than to come up with plausible-seeming models that got the sign right!
Which raises a pretty difficult statistical decision problem. If you build a plausible-seeming statistical model, only to get back a key parameter estimate with what sure seems like the wrong sign, what do you do? Conclude that your model must be bad, and keep tweaking it until you get an estimate with the right sign? Conclude that there’s some problem with your dataset that makes it unsuited to the task? (sample size must be too small, key variables must be missing or mismeasured…) Or what? Not an easy decision, I don’t think. Especially if the dataset is too small to split into training and test sets.
Thanks Jeremy,
I’m sorry to read that you will no pursue this. I would have signed in with my crew.
A simple (and fun) way to do this would be to ask the different teams involved to conduct the same experiment in their favority nearby study system. The experiment don’t have to be too complex. The idea is to generate results in different contexts to build a larger dataset, but applying the same protocols.
+1 from Brian: “And if you really want to get researcher degrees of freedom showing up make it multivariate (e.g. species composition as the explanatory or dependent variable).”
It would be a lot of fun with all these issues on random effects, zeros, phylogenetic and spatial independence, .. .
Cheers
-Raphaël
Hi Jeremy and other interested folks!
As Shan mentioned we’ve been chipping away at this question for a while. It’s so exciting to see someone else get excited about the idea independently!!
If you’re interested, you can check out our project here: https://osf.io/e5xhm/. I’ve just updated it with the latest information.
We’ve chosen two datasets, one from ecology and another one from evolutionary biology/animal behaviour – and we’re hoping to recruit ecologists and evolutionary biologists to assess one or both of these datasets.
We’re passionate about open science so we’ve submitted our proposal as a stage 1 registered report and have been waiting a few months for it to come back from review so we’re a little stalled (which is why Shan and others haven’t heard much) but in good news that means we’ve definitely got the space to accept anyone who has read this blog and finds it interesting. We’re really keen to get as many people on board as possible to get a really good picture of how people analyses these types of datasets and how this can effect our understanding of these systems
Hannah
Pingback: Should your paper include alternative analyses as “robustness checks”? If so, where? Take a quick poll! | Dynamic Ecology