Prioritizing manuscripts, and having data go unpublished for lack of time

A few recent things have gotten me thinking about how I (and others) prioritize working on manuscripts. First and foremost, I have a new baby who is not yet in daycare. My husband and I trade off watching him, and I need to be really efficient and make sure that I work smart during the chunks of time that I get to work. (By the way, one thing I’ve discovered is that I should schedule Skype meetings for times when I am watching the baby, as it’s relatively easy to keep him happy by bouncing on a yoga ball while Skyping. Yoga balls and Skype are apparently important components of work-life balance for me.) In short, prioritizing tasks on my to do list is even more important right now than it normally is. Second, there has been renewed chatter on twitter related to whether some data are not worth publishing because they are not of sufficient interest (which, for some people, apparently is if they can’t aim for Nature, Science, or PNAS). DrugMonkey has been heavily involved in these discussions, and wrote this blog post in response to one of those discussions. His first point is “never let data go unpublished for lack of impact.That seems reasonable. But it made me wonder how much I and others let data go unpublished for lack of time. And, if that is happening, is it a sign that I (or we) should change how we approach things?

To explain a bit more: every PI that I know – and many postdocs and grad students – has generated more data than they have been able to publish. This means that decisions are being made about which projects to write up and which ones not to. As a grad student and postdoc, when deciding which things to write up first, I generally focused on the studies that I thought were the most interesting. I certainly thought it was important to have publications in top journals (and by “top” I mean journals such as Ecology, AmNat, and Evolution), and so whether a manuscript had a shot at one of those journals did influence my thinking on where to place it on my priority list. At the same time, if I manuscript didn’t have a shot at a top journal, I still hoped to get it published at some point (possibly by collecting additional data, or by aiming for a lower impact journal). I just didn’t prioritize it as highly.

Now, as a faculty member, my lab generates even more data, and I have even more responsibilities (especially teaching and service). How do I prioritize manuscripts now? At this point, it is based on who is the lead author. If it is a grad student or a postdoc, that manuscript moves to the top of the priority list. (This recent post by BabyAttachMode suggests not all PIs prioritize this way.) Within that group of high priority manuscripts, it’s more or less first come, first served. In the past year or so, working on those manuscripts has basically taken up all of the time I have available for working on manuscripts. The unintended side effect of this is that manuscripts that do not include students or postdocs as coauthors are currently languishing near the bottom of my manuscript to do list, and it makes me wonder if/when I will get back to them. In this case, results are going unpublished due to lack of time. I really wish this wasn’t the case, but I’m not sure of what to do about it.

This means that some of the data we’ve collected – some of the things we now know based on work done in my lab – are not available generally, which is wasteful. Is this inevitable? Does it mean that everyone who currently has a file drawer of data (or, more likely, folders on their computer containing unpublished data) should take some sort of hiatus from doing science until they’ve cleared the backlog? I don’t think that’s likely to happen, but I don’t know what the right answer is.

So, readers, my questions for you: How do you decide which manuscripts to work on first? Has that changed over time? How much data do you have sitting around waiting to be published? Do you think that amount is likely to decrease at any point? How big a problem do you think the file drawer effect is?

Related posts:
How do you decide authorship order? (by Jeremy)
How to decide where to submit your paper? (also by Jeremy)

43 thoughts on “Prioritizing manuscripts, and having data go unpublished for lack of time

  1. That is so true – as a PhD and post-doc it was important to publish everything I did (or at least try to) – now as an extremely busy PI unless I feel very strongly about some data I tend to leave it up to the Post-doc, PhD or MSc student to make the running. Even so, some of the stuf that I really want to publish can languish for years before I get round to it and unfortunately some of it is very unlikely to see the light of day.

  2. I’ve had a very similar experience. For papers that I’ve been the primary driver on, at this point I’m usually so busy I can’t do much with them, so I either force myself to give a conference talk(s) on them, which pushes me to analyze more data and get things polished enough that non-publication becomes an embarassment to me (and hence motivation), or else I hand them off to a new coauthor (postdoc/student/collaborator) who is more able to get them past the activation threshold for publication (and ideally they get bumped to 1st author and I get bumped from that to senior author). That is mostly working but I have at least 2 projects that are critically close, but also painfully far, from a state where publication can happen, and have languished in that state for a horrifying 10-12 yrs… but I am pushing myself, ever so slowly, to make steps to get them out. Maybe this comment will goad me into getting them done!

    • I had forgotten that people will sometimes hand off the data to a new coauthor to help get them out the door. That’s something I should consider doing.

  3. It would be good to publish as much data as possible. I think that many otherwise unpublishable datasets could get a good outlet if they are released as part of a special issue collecting a whole bunch of similar themed datasets. Otherwise it would be good to release as much as possible with minimal effort to the net with a caveat that this may not be research quality data and extreme care should be taken. Perhaps some of the creative commons licenses could be used to ensure that the data are open but not misused. Alternatively a specially designed open days license could be created. (just brainstorming)

    • I suspect for a lot of people (myself included), we continue to think that we’ll get around to writing up the data some time soon, and so don’t consider simply releasing the data. One problem with publishing just the data is that it certainly would take time to curate the data to get it into a form where it’s useful for others, and so we’re back at the “too little time” problem (and, for many people, with even less of a motivation to do it, since a publication isn’t attached).

      Have other readers released their data? What prompted you to do that? Had you already written up a publication based on the data?

      • I release all my work as I go in my open lab notebook. (being primarily a theorist, I’ll just echo @stevencarlislewalker that it’s a different kind of data)

        It’s not as convenient as a data commons suggested below (what a cool idea!), but people would have to actually visit a data commons for that to work. There are several complete projects that I’ve never got around to publishing for the same priorty reasons you state, but I occassionally send the links to researchers who might be interested. Of course it’s not polished as a paper to understand what is going on, but a specialist might make sense of my notes. Better yet, people googling obscure scientific phrases will often hit upon these posts. This has led at least to some productive email exchanges with other researchers.

        (For example, I only hope someday someone will scoop up my demonstration of the breakdown of the van Kampen approximation to demographic noise in oscillating populations and do something with it, shown in, and other posts under my #tribolium tag)

  4. “How much data do you have sitting around waiting to be published?”

    Quite a bit…and I’m primarily a methodologist/theorist. We generate data too! 8) Its just a different kind of data.

    I sit on a lot of stuff that I don’t think will have much impact. One thing I’ve noticed about myself is that as a graduate student I used to work up any little idea I had, but now as a postdoc I find myself being more skeptical of my own ideas — so I hold back more. Perhaps this is a mistake?

    • The paper that I am probably most embarrassed to still have languishing half written up is a theoretical study (collaborative with Spencer Hall and Chris Klausmeier). We think it’s interesting and the plan is to submit it to AmNat . . . if we ever get around to finishing up writing the manuscript.

      I think your description of working up every idea as a grad student but not as a postdoc seems pretty typical to me, and probably unavoidable, given time constraints.

  5. Agreed one doesn’t get to everything. My priorities are shaped by my interest/enthusiasm of the moment, the unrelenting grant cycle and getting trainees on pubs, their 1st author pubs.

    Otoh, never give up on the data! I have some long latencies from collection to print pub ….

    • I do know someone who tried writing up data after ~10 years. One reviewer slammed him for trying to recycle old data (thinking he must be trying to publish the same data twice). So frustrating.

      We are going to add some data I collected in 2003-2004 to a manuscript my postdoc is working on. Fortunately, I annotated the excel files well.

  6. I wonder if it would be useful to have some sort of “data commons” where you could advertise a data set you collected but have no time to publish. You could submit a short description of the background and experimental design. People who find your data set is interesting and have time to analyze and write it up could contact you for permission to work with the data set.

    The incentive for you is that your data now polished with context, analysis, and interpretation is made available to the public. I imagine that you should probably get co-authorship as you designed the experiment/fieldwork and collected the data, but this could be negotiated.

    For the “data-user” the incentive would be that they currently are in the opposite scenario in that they either have a data:time ratio lower than yours, or they find your data set particularly interesting. I could see this being the case for new graduate students that either have projects that take a long time to generate data, or have had their first few experiments not work out. Having a data set would give the student a chance to go through the process of analyzing and writing up an experiment early on. This might help build some confidence as well as giving them a better idea of things they should be considering while designing their own experiments. Finally I would imagine if the “data commons” were large enough, it wouldn’t be all that rare that someone equally busy finds your data set much more interesting than you do. Maybe one man’s American Midland Naturalist is another’s American Naturalist.

    • I really like this idea. It solves the time problem I raised in response to Aslak above. I imagine this sort of database could be especially helpful for someone working on metaanalyses, especially if it helps counter the file drawer effect.

      Do any of our readers know of something like this?

      • Its not currently geared up in that direction, but figshare ( could be helpful perhaps. Its a repository for datasets/posters/presentations…pretty much whatever. Anything that goes on gets its own DOI so is fully citable.
        Who knows…given enough interest maybe they’d consider adding functionality…although, as datasets have DOIs, is anything extra needed?

    • That’s a really neat idea! Kind of a scaled-up version of what Terry McGlynn calls “calling in the wolf”:

      But no, I don’t know if a forum like this exists in any field, or if anyone’s tried to do it solo (e.g., by publishing a request for “data-users” on Ecolog-L or something).

      Closest I’ve ever come is using this blog to toss out “free ideas for provocative review papers”–review papers that I think are well worth writing, but that I lack the time to write myself.

      A while back, I heard from a grad student who said she’d taken up my suggestion to write that first review paper I’d suggested, and she wanted toget my comments on a draft in the next month. But that was months ago and I haven’t heard from her. Maybe she’s too busy and needs a way to hand the project off to someone else! 😉

    • I like this suggestion, and I was under the impression that figshare was specifically designed with this use case in mind. It associates a doi with your data, which partially resolves the authorship issue. If the analyst can process and interpret the data completely without your help, then they simply cite the doi of your figshare. In most cases, though,mI suspect the analyst might want to bounce ideas off of and get extra insight from the original experimenter, in which case you have a collaboration and thus co-authorship.

      • I’ve started using Figshare recently and because it accepts lots of file formats, I’ve also uploaded chapters of my thesis (data from 2003 and 2004), and conference posters (from 2010). I’d originally intended to publish both but never got around to it.

  7. Dear Meg,

    You say that you prioritise manuscripts involving PhD students and Postdocs. In what way? Do you move them up your list of papers to write yourself or do you make more time reading and improving them?
    The reason I am asking is that on one side I more and more hear fellow Postdocs complaining of having difficulties writing papers (and tellingly the number of writing skill courses etc offered to Postdocs is steadily increasing at any University I look at) and on the other hand, I hear PIs complaining about the slowliness or incapabability of their students or Postdocs in writing papers. But then, often PIs don’t let their students and Postdocs write papers because they think they should be in the lab making data (data that might not get published as your post and the comments show) and because they are so slow in writing.
    I think this is a big problem and I am really concerned about it. Writing paper is a core skill of scientists and should be learned as a PhD student. I would expect any Postdocs to be able to at least produce a decent full first draft if not more. But that only works if they get to learn to write them themselves what is a slow, tedious and painful process. As long as PIs don’t let them do this and this seems to become increasingly common I feel, they won’t.
    So I wonder how common this attitude among PIs really is and whether it contributes to not getting data published because PhD students and Postdocs don’t learn it and so don’t do it, and PIs are swamped with it.


    • Hello Arne. When I say I prioritize papers by students and postdocs, I mean I prioritize making time to read and edit them. Usually, there are a number of early drafts that get just big-scale feedback from me (e.g., related to ideas they are or are not including, flow of arguments, etc.) In many cases, it would be faster for me to just write or rewrite it, but I agree with you that training grad students and postdocs on how to write papers is critical, and so it’s well worth the time to me.

      I have heard colleagues lamenting poor writing skills of postdocs who they expected would be much better at writing (based on that person having several first-authored publications). It does suggest that some PIs are writing papers for their lab members. I agree that it would be interesting to know how common this practice is.

  8. If I stopped collecting data now, I’d be writing things that I’m excited about for the next four years. If I add in sorting through samples I’ve collected and have yet to sort/analyze, add on two more years.

    For me, it’s happenstance, what I’m excited about, what I need in the grant cycle, and what students need to be on papers. A lot like drugmonkey, it sounds. My latency is measured in years, not months. Sometimes > a decade.

    • Now I want to see plots of years since PhD vs. latency. I imagine it’s an accelerating relationship!

      I just thought of another thing that factors in: amount of literature reviewing I would have to do to write up the study. One set of data I haven’t published yet is on something that was a side project. It didn’t pan out (in terms of there being interesting effects there), but I think it’s worth writing up. But I don’t know the literature especially well, and would have to spend a lot of time reading to make sure I was putting it in the proper context.

  9. Pingback: How do your prioritize your manuscript writing efforts? | DrugMonkey

  10. This post scarily encapsulates my 30 year research career. In the early years I made a concerted effort to publish, where possible, every dataset that I generated and was fortunate (I use that word deliberately) not to be in a lab where it was publication in Nature, PNAS or be damned. As a consequence for the first few years the filing drawer was bare as most studies were published but that resulted in what Yoda would describe as “…publish much, responsibility you gain…”.
    Thus promotions resulted leading to post-docs, PhD students and to my horror a rapidly bulging filing cabinet let alone a filing drawer. Now, without question those manuscripts that are driven by our post-docs and PhD students take priority and get my full attention (advice/editing, etc) as these people occupy positions which are time-defined and I personally believe that I have a moral obligation to ensure that if/when they leave our lab they do so with an empty filing drawer and as many publications as possible to set them on their various career paths.
    There are days I sorely wish I could close my office door, with a notice on the door “…manuscript in prep, enter at your peril…” but I suspect that won’t happen soon.

  11. Meg, this is a great blog post. I totally agree with the sentiments by you and others here that works from students take priority, as part of the training process. I also try to prioritize manuscripts from other collaborations on which I am not first author, where I am not leading the writing process but am making a contribution. I’ve had lots of multi-author papers, and there’s nothing worse then having to wait on collaborators for feedback when you’re dying to just get it in for review.

    As a Canadian government scientist, we’re now faced with additional challenges on this collaboration front, in that (in theory) my boss needs to give “approval” on collaborative papers (those I am not first author on, e.g., led by non-government collaborators) before they are submitted, which just holds the whole process up (see a related post on this topic here: I fear it could limit “official” collaborations by government scientists up here, and may relegate us to a line in the acknowledgements even if our participation might otherwise merit co-authorship if it means a 3-month delay in manuscript submission for some middle-manager to sign off on the work. I bring this up within the context of this discussion because it makes me wonder how this might impact the prioritization on external collaboration by other government scientists. For myself, I think it will still rank highly, and I will see if and how these rule changes tangibly affect collaboration with external researchers (including our students at institutions where we hold adjunct positions).

    As for the file drawer effect, I think it can be compounded in government departments, where sometimes a lifetime of data can be collected for a program, where some papers have been generated (but there’s clearly a lot left unpublished) and is left by someone retiring who isn’t so keen on publishing anymore. This sounds like a goldmine waiting to be tapped, but depending on the documentation associated with what’s left, it can also be a terribly frustrating process to even get to the point of having something manageable to work with. Certainly makes the case for standardized data management. Probably the same case happens for alot of PI’s in academia (too much data, not enough career), but the “hand-off” of data might not always happen as it does within a government organization. Certainly strengthens the case made by BM above regarding data commons (but kind of glosses over the importance of GOOD documentation to accompany such a repository to make the data useful).

    And here I am, reading and discussing this great blog post on not having the time to generate manuscripts from languishing data, when what I REALLY should be doing is…

  12. “How do you decide which manuscripts to work on first?” First in, first out. More or less. I’ve rarely been in the position to have multiple manuscripts ready to go at the same time.

    “Has that changed over time?” Not much.

    “How much data do you have sitting around waiting to be published?” Not that much that I think is ready to go. I have several projects where I’m waiting for a little more data before I am confident about having a complete enough story to publish it. I do have a lot of things that I just can’t see being developed into full papers.

    “Do you think that amount is likely to decrease at any point?” Not in the near future.

    “How big a problem do you think the file drawer effect is?” Big as in “how much?” or big as in “how important?” I happen to think the answer to both is “pretty darn big,” but knowing which of those is the real question might help me refine an answer.

  13. Pingback: Friday Coffee Break « Nothing in Biology Makes Sense!

  14. Pingback: What we’re reading: Coelocanth genomics, barcoded pollen, and publication priorities | The Molecular Ecologist

  15. Pingback: Weekend Reading: Be Careful Out There Edition - ProfHacker - The Chronicle of Higher Education

  16. Nice post! I have two criteria: the manuscripts that come first in my to-do list are either those with a more interesting story to tell or those close to being finished. Thus, I establish my priorities based on relevance or completeness. Of course I also take into consideration the agendas of my collaborators; if someone gives me a manuscript to review or some data to analyze and proposes me a concrete deadline, I try hard to respect it. When a task lacks a definite deadline, unfortunately I end up procrastinating it.

  17. Pingback: Natural History Fridays | The Lab and Field

  18. Pingback: That was a great project–whatever happened with that? | Dynamic Ecology

  19. Pingback: Friday links: the research conveyor belt, in (modest) praise of impact factor, and more | Dynamic Ecology

  20. Pingback: Two stage peer review of manuscripts: methods review prior to data collection, full review after | Dynamic Ecology

  21. Pingback: Have you ever abandoned a big line of research? | Dynamic Ecology

  22. Pingback: Friday links: revisiting old papers, life after lab closure, and more | Dynamic Ecology

  23. Pingback: Fancy collaborating? My list of languishing projects « The Lab and Field

  24. Pingback: How I plan to use my research leave | Dynamic Ecology

  25. Pingback: Repost: How do your prioritize your manuscript writing efforts? | Drugmonkey

  26. Great post! This interests me because I am considering trying to publish a manuscript that I have been sitting on for quite some time. How long does data sit around until you consider it too old to publish? Or is there ever a time limit on good data and interesting results? I am wondering how difficult it will be to get my manuscript published given that the data is from 12 years ago. I would love your thoughts on this.

  27. Pingback: Poll results on co-authorship of papers using publicly available data | Dynamic Ecology

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.