Friday link: Am Nat gives its data sharing policy teeth

Sorry, I didn’t look at the intertubes this week, for…reasons. So just one link this week, which I’ll talk a bit about: Am Nat has updated its data sharing policy to give it teeth.

Am Nat (where I’m an editor) was a pioneer of data sharing. Am Nat was among the journals that founded Data Dryad. And complete data archiving (sufficient to reproduce all analyses and results) has been a condition of publication in Am Nat since 2011. But as Bob Montgomerie documents in this recent post, authors often don’t fully comply with Am Nat’s data archiving policies.

Fortunately, complete non-compliance (no posting of any data or code) is actually pretty rare among recent Am Nat papers. But partial non-compliance is pretty common. No readme file, missing variables, provision of summary statistics rather than raw data, incomprehensible and poorly-organized spreadsheets, missing or incomprehensible code, etc.

As Bob notes, part of the issue here is that it’s not always clear exactly what constitutes “compliance”. For instance, is it ok if your data or code are in some file format that’s not widely used? (In my own case, I’m thinking of some of my not-too-ancient papers that used Mathcad.) And part of the issue is that sometimes old code no longer runs, for instance because it uses deprecated R packages. But the biggest issue seems to be authors not taking their data sharing obligations as seriously as they should.

I’ll put my hand up here. I haven’t published in Am Nat lately, but I have published recently in other journals with data sharing policies. Thinking back, I’m pretty sure I’ve always provided reasonably well-organized raw data spreadsheets in a widely-accessible file format (csv or Excel), with readme files. But I haven’t always provided my R code (or in the case of student projects, insisted to my students that they provide their R code). So I’m among the authors who apparently needs a bit of a nudge to get their acts together. And now Am Nat is nudging.

See Bob’s linked post for details. The highlights are that:

  • Am Nat now describes best practices for data archiving
  • Authors will be encouraged to provide their data and code at the time of ms submission. And if a revision is invited, they’ll be required to provide their data and code at that time, prior to final acceptance.
  • A small team of data editors will check the data files and code for compliance, which will be a condition of final acceptance.
  • When Am Nat is made aware of post-2011 Am Nat papers with deficient archiving, the authors will be asked to correct the deficiencies. Which they’d better do, at least if the deficiencies are judged to be serious, because…
  • (quoting from Bob’s post:) “[T]he American Naturalist reserves the right to publish Editorial Expressions of Concern when we are made aware of grossly deficient data archives that are not amended in a reasonable amount of time. In extreme cases, we reserve the right to retract papers that are not supported by appropriately archived data, or to hold up an author’s future submissions until past deficiencies are amended.  However, we also recognize that new policies entail growing pains and that compliance is understandably imperfect as we adjust to a new culture of more rigorous and complete data sharing.”

I wasn’t involved in the development of these new policies, but I support them.

Speaking only for myself, not on behalf of Am Nat, I think one rare but important “use case” for these policies is to help prevent publication of papers based on fraudulent or otherwise anomalous data. Not completely prevent publication of such papers, of course–data editors and reviewers won’t be doing data forensics. But there are recent cases in EEB of papers being retracted for reasons of missing and/or obviously-anomalous data. It’s a good thing that those papers will no longer be publishable in the first place. Like it or not, it’s much easier for journals to decline to publish a paper than to retract a paper.

5 thoughts on “Friday link: Am Nat gives its data sharing policy teeth

  1. I wonder the reason for the guideline: “from each file remove
    variables not analyzed”. One might guess that the rule might shorten
    the length of time to “embargo public access for a set amount of time
    to allow authors to publish related papers”. Is there some other
    reason?

    Meanwhile, how does the guideline interact with with publiction
    requirements imposed by funding agencies? It seems that the
    combination could prevent publication in Dryad until a publication
    analyses all the collected data.

    • I had assumed the guideline “from each file remove variables not analyzed” is just to make life easier for the data editors, and anyone else who downloads the data file. Removing extraneous variables makes the data file easier to understand. But I’m speculating, I don’t know if that’s Dan Bolnick’s reason for adopting this guideline.

      “Meanwhile, how does the guideline interact with with publiction
      requirements imposed by funding agencies? It seems that the
      combination could prevent publication in Dryad until a publication
      analyses all the collected data.”

      That seems unlikely to me, but I haven’t thought much about it.

  2. Pingback: Friday links: philosopher vs. baseball, following Am Nat’s lead on data sharing, and more | Dynamic Ecology

  3. Pingback: Poll on co-authorship of papers using publicly available data | Dynamic Ecology

  4. I have seriously mixed feelings about this. I agree that openly-accessible data are better for science. This allows others to check the correctness of data and confirm published results. It also allows others to use the data for new projects, as well as for syntheses and meta-analyses. So much information has been collected and then lost as scientists retire or pass away; it is a huge waste of resources. Clearly, there are major benefits to enforcing data sharing policies. On the other hand, some datasets take enormous amounts of effort and money to collect. That effort, I believe, should be rewarded with opportunities to participate in research that uses those data. I don’t think it should buy automatic co-authorship, but it should bring opportunities.

    This is particularly important for students and scientists in low and middle-income countries. Here, data are often the main bargaining-tool people have to develop projects, collaborations, and advance their careers. Unlike in more developed nations, people in these countries have fewer resources to develop highly productive careers. For these scientists, making their data completely and freely available means that others, who might have more time and money, can use it to develop ideas and studies they cannot. Motivations for including these people into collaborations are reduced dramatically, just exacerbating inequalities in how different scientists can move forward in their careers across different parts of the World.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.