Note from Jeremy: this is a guest post from Kevin Lafferty, Western Ecological Research Center, U.S. Geological Survey
I hereby challenge you to help me redesign the scientific paper through a process called “Collaborative Independent Review”. But if you’ve already comfortable writing the traditional scientific paper, you’re probably not going to like it.
If you don’t like it, blame Andy Dobson. When Andy invited me to write a chapter for the new book Unsolved Problems in Ecology (Dobson, Holt and Tilman eds., PUP – check it out), he figured I would write about how everyone should think about parasites as much as I do. But I had been reading blog posts on Dynamic Ecology about how we do business as Ecologists (which means you can blame Jeremy, Brian and Meghan too). This got me more worried about ecologists than parasites. I became convinced we could get more return on investment in Ecology through better training programs, funding distribution, synthesis, publication models, and evaluation metrics. And so I wrote a chapter on A Science Business Model for Answering Important Questions. While writing, I kept remembering a 1979 paper called Ecology: A science and a religion, where one of my heroes, Paul Dayton, predicted that ecologists’ increasing focus on conservation would begin to undermine their scientific objectivity. This led me to add a section about reproducibility, which is what Jeremy asked me to blog about. Lots has been said about reproducibility in other disciplines, but I wondered if re-visioning how we write papers and how journals publish them was the key for ecology.
My nagging worry about reproducibility in my own work goes back to a paper that an undergraduate named Kimo Morris and I published in 1996 in Ecology. Kimo’s data showed a strong effect of a brain parasite on fish behavior and subsequent predation by birds. I really wanted to believe Kimo’s data were true. Audiences loved the story, which motivated me to tell it more. And, yet I harbored an insecurity that my own infatuation with his results might have led me to be less skeptical than I was trained to be. I should have repeated the study, but found convenient excuses not to (two kids, no funding, a good swell always on the horizon). Fortunately, as time passed, others looked into this host-parasite relationship in more detail, finding results largely consistent with what Kimo had found. No retraction needed.
What follows is excerpted from my above-mentioned chapter.
We don’t know the extent to which ecological results are reproducible, but concerns about reproducibility from other disciplines suggest this is a topic ecologists should think about. Whereas economics, psychology and biomedical research study humans and a few model organisms, ecologists study biodiversity in its entirety. For this reason, ecologists expect that a single study might not be general, and it is only after amassing many studies from many researchers on many systems do ecologists consider whether support for a hypothesis is general. Ecology is, by its nature, often not reproducible, and there is a tradeoff between ecologists replicating specific studies versus gaining insight from doing similar studies in different contexts. And that might be why progress in ecology sometimes seems like a random walk more than a stable attractor.
Although the goal for Ecology might not be reproducibility, ecologists should at least strive for transparent and unbiased data interpretation. Unfortunately, complex modern statistical analyses allow multiple interpretations, leaving it up to ecologists which results to report and emphasize. Increasing ambiguity is revealed by lower R-square values and higher P-values per paper over time. The desire to report something significant can lead authors to subconsciously report significant outcomes from multiple tests without controlling for multiple comparisons (p-hacking) and ecologists report more significant findings when they gather data with a preconceived hypothesis. On the other hand, the joy in reporting an unexpected finding leads to HARKing (hypothesizing after the results are known), which is encouraged by high impact journals that require authors to emphasize novelty and importance. Furthermore, under the current biodiversity crisis, it is harder to remain neutral and dispassionate about the systems ecologists study. Ecologists’ personal concerns for the environment can emphasize catastrophes, collapses and crises that attract readership, provoking calls for more careful analyses and sober interpretations. And with each Pruitt retraction, there is renewed reason to be skeptical about ecologists in general.
If scientists spend taxpayer money to generate irreproducible results, the public’s logical response should be to either withhold funds or demand a new process that emphasizes reproducibility. Ecologists increasingly acknowledge that reproducibility is important, and there is already a move among journals for transparency and openness guidelines that could help foster reproducibility by having authors adhere to citation standards, data transparency, code archiving, materials archiving, design transparency, pre-registering hypotheses and analytical methods, and replicating past studies. Some have argued that research institutions should implement Good Institutional Practices (i.e., rules, standards, documentation, transparency, blind assessment), but ecologists don’t often follow such practices even when they would be easy to implement. For instance, blind assessment helps researchers avoid bias, and is standard in clinical trials, but is not common in Ecology. Although no journal has adopted all transparency and openness guidelines, several have their own lists. For example, Nature Magazine has an 18-point checklist for Good Institutional Practices in its instructions to authors. Independent assessment could be extended into several other publication steps with the aim to reduce bias, increase specialization, and foster critical thinking. For instance, basing publication acceptance on sound hypotheses and methods rather than the significance or findings (as per the Public Library of Science journals scope) makes it possible to publish negative results, which helps reduce the file-drawer problem. In addition, a few journals embrace reproducibility by inviting repeat studies (e.g., F1000Research). But I don’t think this is enough.
Collaborative Independent Review is one way that funders, journals and scientists could implement a more reproducible paper. The process is collaborative in the sense that four independent teams and an editor author a paper together. The first step is for a Principal Investigator (PI) to propose the questions, hypotheses, predictions, and methods (including proposed analyses). The pre-registered proposal includes an Introduction and Methods, and suggests a target budget for the methods and analyses. Proposals receive double-blind panel review based on expected return on investment. Competitive proposals are revised according to panel review and then put out for bid on by (1) a lead technician (which can be the PI), and (2) a lead analyst (not associated with the PI). The technician receives half the funds up front to implement the methods and report the data. The analyst, who maintains independence by remaining anonymous until publication, blindly tests the a priori predictions and writes and illustrates the results, including appendices describing the analyses in detail. The analyst sends draft results to the PI and technicians for review. Once the three parties agree on the results, the PI submits the Introduction, Methods and Results to an editor who sends the sections to outside referees. In response to the referee reports, the PI, technician and analyst revise the Introduction, Methods and Results. The referees then write a collaborative Discussion about the Results, at which point the funder pays the award balance to all authors (PI, technician, analyst, referees, editor). All data produced in the project become available to the public at the publication date so others can repeat the analyses.
If you’ve read this to the end, I am curious if you hate this idea as much as I expect you will. I certainly have my reservations. In particular, Collaborative Independent Review could discourage the scientific creativity that generates new ideas and hypotheses when unexpected results occur. And all that reproducibility comes at the cost of time, expense, creativity and investigator control. Would you still be motivated to put in the long hours at the expense of not controlling how your data are analyzed and interpreted?
Can we do this now? The biggest obstacle to Collaborative Independent Review is probably engaging the funding agencies. But there should be little barrier to doing collaborative independent writing. If this sounds interesting, I’m looking for fellow ecologists to give this a trial run (sans the funding). Email me and let me know if you are interested in either 1) proposing a question and providing data, 2) blindly analyzing the data, 3) reviewing the results and writing the discussion. I’ll volunteer to act as editor, and find a journal.
Any use of trade, product, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the U.S. government.
Interesting idea… I think it could work for some issues that are of great social relevance; Covid-19 comes to mind. Perhaps a funding agency could embrace this idea and request projects specifically with this model in mind.
Question: much of the research is done by Masters and PhD students… How would they fit into this? I think that they wouldn’t, because I think it’s important for a student to work on all parts of a research.
Hi Pavel, I agree it is important to consider how this fits into a training model for students, who should learn all parts of the process AND have complete ownership of their dissertations. It might be that this model actually allows students to be involved in learning the process, but piecemeal rather than holistically. Whether that is a good thing or not, I’m not sure. But these days my students often get their first taste at publishing as a role player in a multi-author paper. That contrasts with my training in the 1980s where students often struggled to write their first paper from start to finish in a vacuum.
Actually, this makes a lot of sense. A student may take part in one part of such a paper; while also developing his or her own work from beginning to end. I think that this model of producing science could work! I just don’t think it could (or should) fully replace the way we do science now. 🙂
I would be at ease being part of the Collaborative Independent Review (CIR) process you described (I certainly don’t hate it!). My main concern would be the timeline and expected DELAYS. I sometimes have to wait months for a colleague to read on something. I could only imagine how long it would take to go through the full CIR, from start to finish?
As you said, I think the goal for Ecology is not reproducibility (do I get the same results in the same context?), but generality (how the context affect results?). I DO think that ecologists are already striving for transparent and unbiased data interpretation. The issues is discussed and addressed seriously in many circles.
Your concern about delays is one of my pet peeves. And it is inherent in most multi-author papers I have written. Some formal structure for response times would be helpful, as would rewards for sticking to a timeline. I have some thoughts about that in the review process that I write about more in the chapter – namely journals formally rewarding reviewers for their efforts – especially timely reviews. I can send you that PDF if you are interested.
This is an interesting idea. I find collaboration itself a double-edged sword in terms of working with/trusting friends/people you know, and all of the good and bad parts of that. I don’t know how I would feel if I couldn’t ask in person how the data are collected, or discuss assumptions made in the experimental design with the technician or analyst.
I’m also curious how theory work plays into this, as the idea seems to fit very neatly to primarily data/fieldwork oriented papers. I’m less sure how theoretical projects would fit in (often given how closely people work together on such papers). Would the benefits of this kind of approach primarily be for experimental/data work compared to theory or joint projects? I know there is a growing question of reproducibility within computational biology more generally (several recent papers/theses etc), but I think the nature of these issues is different from reproducibility in experiments.
I also imagined this mostly for empirical papers. But models are themselves subject to whims of approach and interpretation. A clever modeler can design a model to affirm a preconceived notion, pick parameter sets that lead to a particular outcome, pick from a range of assumptions to get a particular fit. And so having independent collaboration might have some use in modeling as well. At least in the sense of separating the steps of construction, conception, and interpretation.
Actually, I do think that a version of this process could be adapted for theoretical papers. As some experimented with parallel statistical analysis (having independent teams analysing the same dataset to answer the same question), we could think of parallel modelling with independent teams designing models to answer the same question.
“As some experimented with parallel statistical analysis (having independent teams analysing the same dataset to answer the same question), we could think of parallel modelling with independent teams designing models to answer the same question.”
Isn’t that less about reproducibility, though? It’s more about what Andrew Gelman calls ‘the garden of forking paths’–different analysts (or theoreticians) reach different conclusions because they make different decisions about how to process and analyze the data (or what to include in the theoretical model).
To quote Kevin: “A clever modeler can design a model to affirm a preconceived notion, pick parameter sets that lead to a particular outcome, pick from a range of assumptions to get a particular fit. And so having independent collaboration might have some use in modeling as well.”
To quote Charlotte: “we could think of parallel modelling with independent teams designing models to answer the same question.”
The thing is, that is not what theoretical research does. At least for theoretical research, and I believe also for experimental and modelling research as well, there is a far more important role for individual creativity than is acknowledged here. My approach, and your approach, and somebody else’s approach to “the same question” (what does that even mean?) will be different because me, and you, and that somebody else are different, creative, individuals. Isn’t that the way it’s supposed to be?
I think it’s worth noting that preregistration is not universally accepted as an actual solution in psychology, where reproducibility is more widely acknowledged as a goal in the first place (see Danielle Navarro’s blog here, with a few callouts to papers: https://djnavarro.net/post/paths-in-strange-spaces/). In particular, it doesn’t fix poor experimental design, which is a much harder beast to attack than p-hacking and other point-source evils.
I personally really appreciate Andrew Gelman’s Garden of Forking Paths (http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf) which offers alternate suggestions for how we might address replication concerns — pre- or post- publication replication (though of course, as you mention, this takes time and money that we don’t always have to spend!) and multi-level modeling approaches to address multiple comparisons issues.
As for the concept of the anonymous analyst: I personally can’t imagine how you’d have a completely anonymous analyst until publication, especially if that person is to be involved in revising (and writing) the paper itself. If we’re proposing this is someone just employed by the journal of choice who handles the code and isn’t intimately familiar with the study, I could see how this could be done, but for analyses that require more familiarity with the data under consideration I think you’d run out of eligible analysts quickly. I’d also think you’d want to be able to pre-register with a journal who would then be compelled to publish the paper, no matter the results, in order to avoid the file drawer problem — but I think all of this is so far away from the current state of the world that I’d be surprised to see anything resembling it. I like the idea and I like that I’m now thinking about this! I just don’t see how implementation could work.
I wonder about pre-registration. In my own work, hypotheses often spring up during the analysis and writing process. And some of these are unanticipated. This is part of the creative process in ecology, but it also can lead to post-hoc interpretations of the data, which is hard to resist. I don’t know if anyone would take on collaborative independent review, given that it makes things harder for journals and authors. And it reminds me of campaign finance reform. The ones theoretically in charge of fixing the system are currently the ones that have mastered the current system (myself included). Analysts would need to be sufficiently familiar with the data to analyze it. And this could be done in two ways. 1) authors that provide the data would need to have their meta data sufficiently clear that someone else could use it (a generally good practice), and 2) analysts need to be able to ask questions – which could be done anonymously.
Metrologists distinguish repeatability from reproducibility. Repeatability is broadly the concept that if you keep everything the same as possible (same investigators, same methodology, same object of study etc), you get the same result. Repeatability is broadly the concept that if you change things that shouldn’t matter (different investigators, equivalent methodology, similar objects of study etc), you get comparable results.
I agree that repeatability can (should?) be addressed in a single paper but I suggest reproducibility might be handled in separate papers.
It is a bit of influence from the success of double-blind medical trials (which are a small part of science, but of course massive in influence and monies) trying to convert everything else to their model.
Pre-registration works well for some kinds of tests, but makes no sense in others. I’m not in ecology but astronomy, which is often similar in method, where we do sometimes just take surveys to see what’s out there.
Similarly, open data is very easy when you have a spreadsheet of dozens or hundreds of trials, but harder when you have TBs of simulation outputs or PBs of images. Having to make it all available becomes its own project – sometimes worthwhile, but you gotta fund it and requiring it will exclude people with less resources. Really, different fields and different projects have different needs, trying to put everything in the “like a double blind medication trial” box just doesn’t make sense.
Some of it, of course, is that people want to treat papers as final answers; I think if we did that we’d need to add a new functional layer behind it to communicate provisional answers, two sigma results you need to follow up on to be confident in.
So far no one seems to have risen to your challenge to dislike this idea, so in hopes of stimulating discussion, here goes. I do not like it. Not at all. I think it is wrong on some fronts and lacking on others.
The last thing in the world we need is “a science business model.” The current trend to treat scientific (and other sorts of) creativity as a kind of business and its results as a commodity is one of the most depressing and counterproductive trends we face.
But what I really dislike is that the proposed publication process has an exclusive focus on hypothesis testing. And, based on the concern with p-values, and the idea that hypotheses are supposed to be specified in advance and carried out by some anonymous analyst, it’s pretty clear that what it’s fcousing on is statistical null hypothesis testing. But statistical hypothesis testing is not all there is to ecological research. In fact, there are good arguments that it should be much less of a focus than it is.
Alternatives? Parameter estimation is far more important than null hypothesis testing. What is the relationship between X and Y? Is it linear or nonlinear? increasing or decreasing? how much and how fast? How is the relationship between X and Y affected by Z? These kinds of questions actually tell us something important about X, and Y, and even Z. We want not a list of “significant effects,” we want to know how things work. But parameter estimation doesn’t appear in the discussion. An interest in parameter estimation implies that a primary goal of ecological (and other kinds of) research is model selection, of evaluating what the data mean as scientific evidence, not as accept/reject decision theory. This fundamental aspect of scientific activity doesn’t appear in the discussion.
Nor does research on ecological theory (of course I would think of that). And probably lots of other kinds of ecological research as well.
So, I really do not like this proposal for changing the way research is done and scientific papers are written. I don’t think it is adequate for the diversity of ecological research, nor for the challenges that are faced by that research, which go way beyond just hypothesis testing.
Parenthetically, I also doubt the seriousness of the replicability crisis. But that’s a different issue. Some interesting thoughts about that appear in a chapter written by Robert MacArthur shortly before his death. The historical situation and the landscape of ecological questions was different then, but there are valuable insights half a century on.
MacArthur, R.H. 1972. Coexistence of species. pp. 253-259 in Behnke, J. A. ed. Challenging Biological Problems. AIBS. Oxford University Press, New York.
Hi Hal; I also do not like the idea, because I can’t see what major problem in ecological research it is designed to address. As you note ecological research is quite diverse; eg, parameter estimation and theory development have been most of my own career.
Theory development is thriving , and I can point to the 50th anniversary issue of TPB as a great historical source: the special june 2020 issue is ‘open access’ and the papers have been widely downloaded; indeed, of the 25 most downloaded TPB papers over the last 90 days, 18 of them are from this special issue, and 10 of the top 11 ! The EinChief’s over view article is here:
Hi Hal (are you in Amsterdam or WHOI?), Thanks for your comment. You have to go back a few generations before ecologists seem comfortable saying what they don’t like anymore. And I’m not saying you’re old, but you are a demographer, so you know better how to define that than I.
Although I might have emphasized hypothesis testing, that’s not my emphasis. That is an entirely separate and important topic. I don’t see why collaborative independent review could not apply to research that generates parameter estimation, or theory. The question is whether we might get more accurate and unbiased information when we partition efforts among independent observers. And, if we could, would it be worth the constraints and cumbersome nature of doing so? One could certainly ask the same questions for our current anonymous peer review process.
As for liking or not liking a science business model, I should emphasize that I don’t think science should be run as a business. I use the phrase business model to capture the norms and procedures (training, grants, publication etc) we use to do science. So, we have a business model currently (which commonly tests hypotheses). Your business model might produce better parameter estimates. In my agency, the business model aims to create accurate and unbiased information to help guide better policy. I think all of these outcomes might benefit from more blindness, and less bias. How to do that without stifling creativity is an interesting question.
BTW, your comment about parameter estimation reminded me that when I read RH Peters, A Critique of Ecology, twenty years ago, I was thinking we would change our business model to parameter estimation, but ecologists pretty much ignored that advice. I think ecologists are heavily invested in their business model, and that makes it hard to change. However, increased computing means it is easier to apply Bayesian tools for exploring parameter estimation (which I endorse), and less need to focus on p-values and hypothesis testing. Richard McElreath’s statistical rethinking book does a nice job of introducing this concept to the current generation, and he might succeed where Peters failed.
I think the single change that will increase the value of scientific publications is to request publishers and editors require authors to separate results/conclusions into three sections; results, conclusions and speculations and to enforce this segmentation.
Although I am far from hating the idea, I am not sure it could have a deep impact if the current incentives of the so called “business model” of academic research do not change. If all parties involved are still strongly motivated to find novel and statistically significant results, they most probably will…
I like the thinking on this. The issue I see is that I think a person who is intimately involved with study design and data collection can often handle the analysis with more sensitivity to the limits of the data and with a much-needed reality check of knowing if the statistical outputs really are describing the study system.
“ at which point the funder pays the award balance to all authors (PI, technician, analyst, referees, editor).”
This “system” as proposed will reinforce the power of salaried/tenured scientists as gatekeepers who don’t depend on grants to pay their wages. In many countries article page charges alone could pay for several months of salary. Ecology studies are sometimes funded at only 5-10keuros – how is this going to underwrite a system like that proposed??
For example, as a modeler/data scientist I get told often that I can do stuff for free with R – so zero funds, because all the time in front of the computer doesn’t “count” compared to the diver or boat costs. These same costs are also often the excuse for relying on one-off observations to drive the entire narrative of articles, despite objections by analysts. The high cost of a single data point justifies its inclusion and gives it extra “weight” as proof of something. Clearly this is unacceptable.
If ecologists as a community are to value reproducible results the age long battles about models and mathematics in departments of biological/ecological sciences have to be dealt with. Mathematics are a necessary tool of ecological science and fundamental to reproducible studies. I believe many of the current problems in advancing ecology stem from the historical rejection of mathematics as a suitable tool for biological and ecological studies. When I did research in chemistry, no one ever talked about “too much math “ and reproducible results were the norm ! Ecology, however, is still dealing with a literature and scientific practice that are largely based on reputations and opinions of well-connected individual scientists, not reproducible science. Possibly changes in publishing will help, but not without a parallel concerted effort to reproduce and re-evaluate long held “ecological truisms”.
Pingback: Help me redesign the scientific paper – Environmental News Bits
Interesting – but it sounds like a lengthy process, or at least it would be highly dependent on the availability of the key players. How long would it take to complete the writing such a paper, compared to the traditional way? In fast moving and/or highly competitive areas, could this model be an obstacle?