A proposal for replicating published statistical analyses in ecology & evolution

Posted on June 24, 2013 by Jeremy Fox

Recently, I noted a major kerfuffle in economics which began when a graduate student discovered serious analytical errors in a paper by two famous Harvard economists. The student discovered the errors because he’d been assigned the task of reproducing the analyses in a published paper for a graduate course he was taking.

So here’s an idea: maybe we should do the same in ecology and evolution? It’s increasingly the case that authors are obliged to make their raw data and computer code freely available to anyone after publication. So this kind of assignment is increasingly feasible. It’s an assignment that would fit well with graduate courses in statistical methods, and maybe with other sorts of courses as well. It’s a great learning exercise for students. Teaches you to pay attention to subtle but crucial statistical and coding details. Gets you thinking about other ways one might have done the analysis. Etc. And, on the off chance that you discover a serious mistake (or fraud!) in a published paper, you might well get a paper of your own out of the exercise.

This seems like a nice way of getting around the fact that correctness and reproducibility of analyses are important, but nobody has much personal incentive to check them. Even pre-publication peer reviewers rarely go to the trouble of reproducing analyses. And even if they wanted to they mostly couldn’t, since raw data and code aren’t ordinarily made available until after the ms is accepted. But if grad students are going to be doing some sort of data-analytical assignment for a class anyway, there’s no cost (not even an opportunity cost) to having them try to reproduce the analyses in a published paper.

What do you think? Do you already do this, or know someone who does? Is there any downside to it?

31 thoughts on “A proposal for replicating published statistical analyses in ecology & evolution”

jonfwilkins on June 24, 2013 at 1:46 pm said:

That’s a fantastic idea. Don’t know if anyone is doing it, but it should be part of every curriculum from now on.

Reply ↓
Ross Mounce (@rmounce) on June 24, 2013 at 1:56 pm said:

They run replication / paper-critiquing as an regular exercise at Imperial College for Biology undergrads I think. But its not just for teaching… If you re-use data on a day-to-day basis like I do you discover loads of errors, quirks and discrepancies between the associated data and the peer-reviewed publications.

If you’re lucky you can even get a publication out of it, e.g. if the original paper is *significantly* wrong. I corrected a Nature paper back in 2011: http://dx.doi.org/10.1038/nature10266 and I wasn’t the only one to spot the dodgy analysis either: http://dx.doi.org/10.1038/nature10267

But most of the time, sadly there’s very little incentive to actually go back and ‘correct the scientific record’. Some people view it as pedantry, but I think it’s really important. Ideally these problems would be caught during the peer-review process but apparently I’m told there’s a reluctance from many reviewers to look-at / review / re-run code or statistical analyses during review

One thing’s for sure… the more people look the more dodgy papers we’ll find – I think we’ve barely scratched the surface of the extent of irreproducible but published science yet.

Reply ↓
- Jeremy Fox on June 24, 2013 at 8:48 pm said:
  
  Yes, even with my proposal there’s still not a lot of incentive to try to correct the scientific record if you do fail to replicate a published analysis. That economics kerfuffle I linked to, where the field really did take notice of the correction to the published record, is exceptional.
  
  Reply ↓
Drew Tyre on June 24, 2013 at 1:58 pm said:

I think it would be a great idea. I use published data for some of the assignments in my graduate Ecological Statistics course. Sometimes the published analysis is a bit beyond what I’m asking the students to do however. It would be great as a semester project for students without data of their own.

Reply ↓
Florian Hartig on June 24, 2013 at 2:10 pm said:

I know the economics people at my university are doing that for their masters courses (with very good experiences) and I have been thinking about doing the same.

One doubt is that ecological analysis seems a lot more messy to repeat than econometric analysis, which is mostly standard regression models (at least this is what the economists told me).

Reply ↓
- Jeremy Fox on June 24, 2013 at 8:49 pm said:
  
  If you have proper metadata, and you have properly described all details of the analysis, the “messiness” of the analysis shouldn’t matter at all for replication, I don’t think. Of course, perhaps those are big ifs! But if they are, then that in itself is a problem.
  
  Reply ↓
  - Jim Bouldin on June 25, 2013 at 3:54 pm said:
    
    My experience has been that you often don’t have proper meta-data, or even the proper actual data, as discussed by Chris Nadeau below. I think everyone know that if you have a lot of data files and complex analysis, it takes a lot of time to organize and describe it all well. Which I think is why it’s often not done unless specifically requested by the journal or granting agency.
Josh King on June 24, 2013 at 2:19 pm said:

I’d prefer starting with replicating the experiments themselves, with updated analyses. Ecology does not generally do the “re” part of research.

Reply ↓
- Jeremy Fox on June 24, 2013 at 8:52 pm said:
  
  Ah, now that’s probably impossible in many cases. And when it is possible, there are various reasons why you might not expect the experiment to come out the same. Of course, the latter is not a reason not to repeat the experiment.
  
  One reason to like multi-site projects like NutNet is that the experiment gets repeated many times in different places right from the get-go. I have an old post on NutNet.
  
  Reply ↓
Margaret Kosmala on June 24, 2013 at 2:53 pm said:

Nice idea. My only hesitation is that coding style in ecology and evolution is, well, all over the map. There’s the danger of grad students encountering horrible style and thinking that it’s the correct way to code. This isn’t a prohibitive issue, but something that should be considered.

Reply ↓
- ucfagls on June 24, 2013 at 3:08 pm said:
  
  That would be another good learning point that the students could discover, discuss etc. It isn’t really sufficient to get the “reproducible” badge for one to simply open up the code and data. Coding style and quality of the code are just as important for gaining an understanding of what was done to the data and how methods were applied.
  
  Reply ↓
- Jeff Walker on June 24, 2013 at 4:18 pm said:
  
  I would suggest that students do not start with someone’s code. First, there could be bugs in it. Second, there could be various decisions that were made but not given in the methods of the paper. Things like if data were centered, scaled, transformed etc. So let the student write the code from a blank slate and make these decisions. Then see if the results are replicated. If not, why? Difference in decision or bug in the code. I suspect bugs in the code are common, if my own scripting is evidence!
  
  Reply ↓
  - Margaret Kosmala on June 26, 2013 at 2:31 am said:
    
    A good idea. But it can be a *ton* of work. (Having done some of this myself.) Much more than you could expect for a single class, I think.
infotroph on June 24, 2013 at 2:53 pm said:

For my graduate statistics course I was asked to analyze one of my own datasets. I don’t know how common this is, but it’s not necessarily true that assigning published datasets has no opportunity cost.

That said, I didn’t actually use that in-class analysis when I wrote up the experiment, so maybe I would have been better off spending the time with someone else’s data.

Reply ↓
- Jeremy Fox on June 24, 2013 at 8:59 pm said:
  
  Yes, it’s common for students to be asked to do this sort of thing in graduate stats courses.
  
  Reply ↓
Jim Bouldin on June 24, 2013 at 3:14 pm said:

I’m 100% for it. Lip service about “reproducibility”–a putative hallmark of scientific methods–is just that until somebody actually tries to do so. And yes I do it when I have the time. The thing though, is that it’s typically *very* time consuming as you try to track down the correct data, make sure you know exactly what the original authors did, investigate various alternatives to what they did, etc etc. It can consume all your time.

Those who’ve followed the climate change debate, particularly involving paleoclimate, and particularly dendroclimatology, know that this issue is at the very core of the bad blood (very bad blood indeed) between certain scientists and/or institutions, and those who’ve been trying to reproduce their results.

Reply ↓
jebyrnes on June 24, 2013 at 3:24 pm said:

I think Hurlbert used to do something like this. Might be worth checking with him.

Reply ↓
Chris Nadeau on June 24, 2013 at 3:27 pm said:

I just attended a 3-week workshop on dynamic modeling in ecology where we spent 4 days reproducing the results of a published paper. I think we were all surprised to find how difficult it was to reproduce published models. For example, many parameter values were missing from the papers. Moreover, more than one paper (in the 10 or so we tried to reproduce) had mistakes in the model equations or the analytical results that made reproducing the results impossible (in many cases we verified these mistakes with the authors to ensure it wasn’t just our misunderstanding). None of the mistakes had major implications on the conclusions of the paper, such as in the economics example, but were mistakes nonetheless.

I think this was a great learning experience. I will definitely put more effort into ensuring my future papers are written in a way that is reproducible. I also think this is a great way to teach dynamic models in ecology and could promote replication of published papers with slightly different assumptions to test the robustness of the published results.

Reply ↓
Nathan Lemoine on June 24, 2013 at 3:52 pm said:

I just submitted a paper with an appendix of all my R code, from data import to final plots. I used the ‘knitr’ package to make a nice, visually pleasing PDF of the code, rather than uploading a .R file that can be difficult to follow. I’m not sure if the reviewers liked this or not, or even looked at it.

Reply ↓
- Drew Tyre on June 24, 2013 at 4:14 pm said:
  
  Bravo Nathan! I look at them when I review papers!
  
  Reply ↓
- Margaret Kosmala on June 26, 2013 at 2:33 am said:
  
  I like to run the code when I review, so I would have preferred the .R file rather than the pretty PDF…
  
  Reply ↓
Pingback: ‘Round and about | Ecologically Orientated
Tim Vines on June 24, 2013 at 3:58 pm said:

We have a group at UBC that has looked at the reproducibility of Structure analyses on population genetics data, and we managed to get a paper out of it – Gilbert et al (http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2012.05754.x/abstract). We’re also working on the related topic of how easy it is to get the data from published papers. We had a paper in FASEBJ earlier this year (http://arxiv.org/pdf/1301.3744.pdf) and we’re working on a study examining how data availability changes with time since publication (answer: almost no data survive more than 15 years).

Reply ↓
- Amy Parachnowitsch on June 26, 2013 at 7:20 am said:
  
  It is going to be really interesting to see how data availability changes and changes our fields in the next decade or two. I wonder if we’d find the same things re-running analyses in ecology (for example, switching from SAS to R, etc). I think we’d like to think that the answers would be the same! But I wonder…
  
  Reply ↓
  - Jeremy Fox on June 26, 2013 at 12:06 pm said:
    
    Well, one simple difference is that SAS defaults to type III SS. R defaults to sequential SS. Which matters in GLMs if your design is unbalanced…
sjleroux on June 24, 2013 at 4:15 pm said:

Great idea…but good point raised by Josh! I have taken theoretical ecology courses where we have reproduced modelling results of manuscripts. Similar to Chris, it was surprising to discover that many papers have small mistakes or omissions. I think it makes for a great learning experience in terms of analyses and clear presentation of results. I plan to have students work through published results in a modelling course I am teaching next winter.

Reply ↓
Benjamin Martin on June 24, 2013 at 4:18 pm said:

This is a brilliant idea and I think would be hugely valuable for both the student and the reproducibility of ecological research in general. For the student this provides a great intermediate step between textbook stats and real-world stats.
But I the benefit for reproducibility would be even greater. Knowing that someone is going to repeat your analysis already gets you thinking in a much greater degree of detail exactly what information another scientist would need to reproduce your result. This might even prompt me to have a colleague try to reproduce a few main results just to avoid any problems down the line. Actually we should already be doing this but as you note it is hard to promote good behavior just because it is good, we also need incentives.
The main challenge I see is that some statistical methods have become complex that for many papers there are only an handful of people who have the breadth of knowledge to repeat the analysis in any meaningful way. You could obviously use some filter to avoid these papers, but then you have the problem that the hardest to replicate methods are also the least checked methods.

Reply ↓
Fred Barraquand on June 24, 2013 at 10:00 pm said:

+1, I’ve done this and think this is excellent practice. Some profs even use published (and wrong!) analyses to show students how to do better, but I don’t have examples of published corrections emerging this way. Unfortunately, I can see at least two reasons why reproducing results in the classroom does not translate often into corrections of the scientific literature.

First, more hours than those of the course are usually needed to reproduce published results (i.e. check sensitivity to assumptions, data formatting, etc.). It’s a very good thing that more and more computer code + data are available, but just executing code is not enough to evaluate the results: “little tweaks” or bugs still allow the program to run. It’s relatively easy to get a feeling that a statistical analysis is dodgy, but not to prove it in a systematic manner. So it requires lots of time: feasible only as a “very long-term homework”?

And then, what to do when you find a mistake? If it has important practical consequences, then sure, you have an incentive to take credit and write a reply to the journal (as with the economics example). Otherwise, taking credit for finding the error will likely angry colleagues, with little gain for yourself. Especially if the error is minor – or the colleague is much more influential than the teacher. Some anonymous post-publication review might be healthy, if moderated by editors, but I don’t know if it exists. At PlosOne, for instance, you have to register to post comments on papers which might limit the critique (though you could create a fake “scientific Zorro” account ;))

Reply ↓
duffymeg on June 25, 2013 at 4:55 pm said:

I just saw this paper (“Some Simple Guidelines for Effective Data Management” by Borer et al) mentioned on twitter, which seems relevant to this discussion:
http://www.esajournals.org/doi/full/10.1890/0012-9623-90.2.205
ht: Kara Woo

Reply ↓
- Amy Parachnowitsch on June 26, 2013 at 7:32 am said:
  
  Thanks Meg! I just passed this paper on to the PhD students that I work with. It is nice to have all the suggestions in one simple document.
  
  Reply ↓
Pingback: Friday links: the need for replication, how artists and scientists present their work, and more (much more!) | Dynamic Ecology