A proposal for replicating published statistical analyses in ecology & evolution

Recently, I noted a major kerfuffle in economics which began when a graduate student discovered serious analytical errors in a paper by two famous Harvard economists. The student discovered the errors because he’d been assigned the task of reproducing the analyses in a published paper for a graduate course he was taking.

So here’s an idea: maybe we should do the same in ecology and evolution? It’s increasingly the case that authors are obliged to make their raw data and computer code freely available to anyone after publication. So this kind of assignment is increasingly feasible. It’s an assignment that would fit well with graduate courses in statistical methods, and maybe with other sorts of courses as well. It’s a great learning exercise for students. Teaches you to pay attention to subtle but crucial statistical and coding details. Gets you thinking about other ways one might have done the analysis. Etc. And, on the off chance that you discover a serious mistake (or fraud!) in a published paper, you might well get a paper of your own out of the exercise.

This seems like a nice way of getting around the fact that correctness and reproducibility of analyses are important, but nobody has much personal incentive to check them. Even pre-publication peer reviewers rarely go to the trouble of reproducing analyses. And even if they wanted to they mostly couldn’t, since raw data and code aren’t ordinarily made available until after the ms is accepted. But if grad students are going to be doing some sort of data-analytical assignment for a class anyway, there’s no cost (not even an opportunity cost) to having them try to reproduce the analyses in a published paper.

What do you think? Do you already do this, or know someone who does? Is there any downside to it?

31 thoughts on “A proposal for replicating published statistical analyses in ecology & evolution

  1. That’s a fantastic idea. Don’t know if anyone is doing it, but it should be part of every curriculum from now on.

  2. They run replication / paper-critiquing as an regular exercise at Imperial College for Biology undergrads I think. But its not just for teaching… If you re-use data on a day-to-day basis like I do you discover loads of errors, quirks and discrepancies between the associated data and the peer-reviewed publications.

    If you’re lucky you can even get a publication out of it, e.g. if the original paper is *significantly* wrong. I corrected a Nature paper back in 2011: http://dx.doi.org/10.1038/nature10266 and I wasn’t the only one to spot the dodgy analysis either: http://dx.doi.org/10.1038/nature10267

    But most of the time, sadly there’s very little incentive to actually go back and ‘correct the scientific record’. Some people view it as pedantry, but I think it’s really important. Ideally these problems would be caught during the peer-review process but apparently I’m told there’s a reluctance from many reviewers to look-at / review / re-run code or statistical analyses during review:/

    One thing’s for sure… the more people look the more dodgy papers we’ll find – I think we’ve barely scratched the surface of the extent of irreproducible but published science yet.

    • Yes, even with my proposal there’s still not a lot of incentive to try to correct the scientific record if you do fail to replicate a published analysis. That economics kerfuffle I linked to, where the field really did take notice of the correction to the published record, is exceptional.

  3. I think it would be a great idea. I use published data for some of the assignments in my graduate Ecological Statistics course. Sometimes the published analysis is a bit beyond what I’m asking the students to do however. It would be great as a semester project for students without data of their own.

  4. I know the economics people at my university are doing that for their masters courses (with very good experiences) and I have been thinking about doing the same.

    One doubt is that ecological analysis seems a lot more messy to repeat than econometric analysis, which is mostly standard regression models (at least this is what the economists told me).

    • If you have proper metadata, and you have properly described all details of the analysis, the “messiness” of the analysis shouldn’t matter at all for replication, I don’t think. Of course, perhaps those are big ifs! But if they are, then that in itself is a problem.

      • My experience has been that you often don’t have proper meta-data, or even the proper actual data, as discussed by Chris Nadeau below. I think everyone know that if you have a lot of data files and complex analysis, it takes a lot of time to organize and describe it all well. Which I think is why it’s often not done unless specifically requested by the journal or granting agency.

    • Ah, now that’s probably impossible in many cases. And when it is possible, there are various reasons why you might not expect the experiment to come out the same. Of course, the latter is not a reason not to repeat the experiment.

      One reason to like multi-site projects like NutNet is that the experiment gets repeated many times in different places right from the get-go. I have an old post on NutNet.

  5. Nice idea. My only hesitation is that coding style in ecology and evolution is, well, all over the map. There’s the danger of grad students encountering horrible style and thinking that it’s the correct way to code. This isn’t a prohibitive issue, but something that should be considered.

    • That would be another good learning point that the students could discover, discuss etc. It isn’t really sufficient to get the “reproducible” badge for one to simply open up the code and data. Coding style and quality of the code are just as important for gaining an understanding of what was done to the data and how methods were applied.

    • I would suggest that students do not start with someone’s code. First, there could be bugs in it. Second, there could be various decisions that were made but not given in the methods of the paper. Things like if data were centered, scaled, transformed etc. So let the student write the code from a blank slate and make these decisions. Then see if the results are replicated. If not, why? Difference in decision or bug in the code. I suspect bugs in the code are common, if my own scripting is evidence!

      • A good idea. But it can be a *ton* of work. (Having done some of this myself.) Much more than you could expect for a single class, I think.

  6. For my graduate statistics course I was asked to analyze one of my own datasets. I don’t know how common this is, but it’s not necessarily true that assigning published datasets has no opportunity cost.

    That said, I didn’t actually use that in-class analysis when I wrote up the experiment, so maybe I would have been better off spending the time with someone else’s data.

  7. I’m 100% for it. Lip service about “reproducibility”–a putative hallmark of scientific methods–is just that until somebody actually tries to do so. And yes I do it when I have the time. The thing though, is that it’s typically *very* time consuming as you try to track down the correct data, make sure you know exactly what the original authors did, investigate various alternatives to what they did, etc etc. It can consume all your time.

    Those who’ve followed the climate change debate, particularly involving paleoclimate, and particularly dendroclimatology, know that this issue is at the very core of the bad blood (very bad blood indeed) between certain scientists and/or institutions, and those who’ve been trying to reproduce their results.

  8. I just attended a 3-week workshop on dynamic modeling in ecology where we spent 4 days reproducing the results of a published paper. I think we were all surprised to find how difficult it was to reproduce published models. For example, many parameter values were missing from the papers. Moreover, more than one paper (in the 10 or so we tried to reproduce) had mistakes in the model equations or the analytical results that made reproducing the results impossible (in many cases we verified these mistakes with the authors to ensure it wasn’t just our misunderstanding). None of the mistakes had major implications on the conclusions of the paper, such as in the economics example, but were mistakes nonetheless.

    I think this was a great learning experience. I will definitely put more effort into ensuring my future papers are written in a way that is reproducible. I also think this is a great way to teach dynamic models in ecology and could promote replication of published papers with slightly different assumptions to test the robustness of the published results.

  9. I just submitted a paper with an appendix of all my R code, from data import to final plots. I used the ‘knitr’ package to make a nice, visually pleasing PDF of the code, rather than uploading a .R file that can be difficult to follow. I’m not sure if the reviewers liked this or not, or even looked at it.

  10. Pingback: ‘Round and about | Ecologically Orientated

  11. We have a group at UBC that has looked at the reproducibility of Structure analyses on population genetics data, and we managed to get a paper out of it – Gilbert et al (http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2012.05754.x/abstract). We’re also working on the related topic of how easy it is to get the data from published papers. We had a paper in FASEBJ earlier this year (http://arxiv.org/pdf/1301.3744.pdf) and we’re working on a study examining how data availability changes with time since publication (answer: almost no data survive more than 15 years).

    • It is going to be really interesting to see how data availability changes and changes our fields in the next decade or two. I wonder if we’d find the same things re-running analyses in ecology (for example, switching from SAS to R, etc). I think we’d like to think that the answers would be the same! But I wonder…

      • Well, one simple difference is that SAS defaults to type III SS. R defaults to sequential SS. Which matters in GLMs if your design is unbalanced…

  12. Great idea…but good point raised by Josh! I have taken theoretical ecology courses where we have reproduced modelling results of manuscripts. Similar to Chris, it was surprising to discover that many papers have small mistakes or omissions. I think it makes for a great learning experience in terms of analyses and clear presentation of results. I plan to have students work through published results in a modelling course I am teaching next winter.

  13. This is a brilliant idea and I think would be hugely valuable for both the student and the reproducibility of ecological research in general. For the student this provides a great intermediate step between textbook stats and real-world stats.
    But I the benefit for reproducibility would be even greater. Knowing that someone is going to repeat your analysis already gets you thinking in a much greater degree of detail exactly what information another scientist would need to reproduce your result. This might even prompt me to have a colleague try to reproduce a few main results just to avoid any problems down the line. Actually we should already be doing this but as you note it is hard to promote good behavior just because it is good, we also need incentives.
    The main challenge I see is that some statistical methods have become complex that for many papers there are only an handful of people who have the breadth of knowledge to repeat the analysis in any meaningful way. You could obviously use some filter to avoid these papers, but then you have the problem that the hardest to replicate methods are also the least checked methods.

  14. +1, I’ve done this and think this is excellent practice. Some profs even use published (and wrong!) analyses to show students how to do better, but I don’t have examples of published corrections emerging this way. Unfortunately, I can see at least two reasons why reproducing results in the classroom does not translate often into corrections of the scientific literature.

    First, more hours than those of the course are usually needed to reproduce published results (i.e. check sensitivity to assumptions, data formatting, etc.). It’s a very good thing that more and more computer code + data are available, but just executing code is not enough to evaluate the results: “little tweaks” or bugs still allow the program to run. It’s relatively easy to get a feeling that a statistical analysis is dodgy, but not to prove it in a systematic manner. So it requires lots of time: feasible only as a “very long-term homework”?

    And then, what to do when you find a mistake? If it has important practical consequences, then sure, you have an incentive to take credit and write a reply to the journal (as with the economics example). Otherwise, taking credit for finding the error will likely angry colleagues, with little gain for yourself. Especially if the error is minor – or the colleague is much more influential than the teacher. Some anonymous post-publication review might be healthy, if moderated by editors, but I don’t know if it exists. At PlosOne, for instance, you have to register to post comments on papers which might limit the critique (though you could create a fake “scientific Zorro” account ;))

    • Thanks Meg! I just passed this paper on to the PhD students that I work with. It is nice to have all the suggestions in one simple document.

  15. Pingback: Friday links: the need for replication, how artists and scientists present their work, and more (much more!) | Dynamic Ecology

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s