Sorry, I didn’t look at the intertubes this week, for…reasons. So just one link this week, which I’ll talk a bit about: Am Nat has updated its data sharing policy to give it teeth.
Am Nat (where I’m an editor) was a pioneer of data sharing. Am Nat was among the journals that founded Data Dryad. And complete data archiving (sufficient to reproduce all analyses and results) has been a condition of publication in Am Nat since 2011. But as Bob Montgomerie documents in this recent post, authors often don’t fully comply with Am Nat’s data archiving policies.
Fortunately, complete non-compliance (no posting of any data or code) is actually pretty rare among recent Am Nat papers. But partial non-compliance is pretty common. No readme file, missing variables, provision of summary statistics rather than raw data, incomprehensible and poorly-organized spreadsheets, missing or incomprehensible code, etc.
As Bob notes, part of the issue here is that it’s not always clear exactly what constitutes “compliance”. For instance, is it ok if your data or code are in some file format that’s not widely used? (In my own case, I’m thinking of some of my not-too-ancient papers that used Mathcad.) And part of the issue is that sometimes old code no longer runs, for instance because it uses deprecated R packages. But the biggest issue seems to be authors not taking their data sharing obligations as seriously as they should.
I’ll put my hand up here. I haven’t published in Am Nat lately, but I have published recently in other journals with data sharing policies. Thinking back, I’m pretty sure I’ve always provided reasonably well-organized raw data spreadsheets in a widely-accessible file format (csv or Excel), with readme files. But I haven’t always provided my R code (or in the case of student projects, insisted to my students that they provide their R code). So I’m among the authors who apparently needs a bit of a nudge to get their acts together. And now Am Nat is nudging.
See Bob’s linked post for details. The highlights are that:
- Am Nat now describes best practices for data archiving
- Authors will be encouraged to provide their data and code at the time of ms submission. And if a revision is invited, they’ll be required to provide their data and code at that time, prior to final acceptance.
- A small team of data editors will check the data files and code for compliance, which will be a condition of final acceptance.
- When Am Nat is made aware of post-2011 Am Nat papers with deficient archiving, the authors will be asked to correct the deficiencies. Which they’d better do, at least if the deficiencies are judged to be serious, because…
- (quoting from Bob’s post:) “[T]he American Naturalist reserves the right to publish Editorial Expressions of Concern when we are made aware of grossly deficient data archives that are not amended in a reasonable amount of time. In extreme cases, we reserve the right to retract papers that are not supported by appropriately archived data, or to hold up an author’s future submissions until past deficiencies are amended. However, we also recognize that new policies entail growing pains and that compliance is understandably imperfect as we adjust to a new culture of more rigorous and complete data sharing.”
I wasn’t involved in the development of these new policies, but I support them.
Speaking only for myself, not on behalf of Am Nat, I think one rare but important “use case” for these policies is to help prevent publication of papers based on fraudulent or otherwise anomalous data. Not completely prevent publication of such papers, of course–data editors and reviewers won’t be doing data forensics. But there are recent cases in EEB of papers being retracted for reasons of missing and/or obviously-anomalous data. It’s a good thing that those papers will no longer be publishable in the first place. Like it or not, it’s much easier for journals to decline to publish a paper than to retract a paper.