Should your paper include alternative analyses as “robustness checks”? If so, where? Take a quick poll!

There often are different ways of doing a statistical analysis, all of them defensible. Doing the analysis with vs. without an outlier. Doing a general linear model on transformed data vs. doing a generalized linear model on untransformed data. Addressing collinearity by dropping a collinear predictor, vs. doing some sort of formal model selection, vs. doing a PCA on the predictor variables and using the PCA axes as new predictor variables. Deciding how many terms your statistical model should include. Etc. etc.

Sometimes, different statistical choices lead to different results. This sensitivity of results to different statistical choices is known as “researcher degrees of freedom“. Researcher degrees of freedom can make it hard to choose the “right” statistical analysis, and can lead to arguments among researchers as to what the “right” analysis is.

But what about the case in which different defensible statistical choices all lead to the same results? That is, cases in which the scientific conclusion is robust to different statistical choices? It’s tempting to say that robustness makes your statistical choices easy. Just choose whatever analysis you want, because your choice doesn’t matter.

But here’s the problem: reviewers and readers won’t necessarily believe that your results are robust if you only show them one analysis. They’ll ask “Did you correct for [thing]?” “Are you sure the results aren’t driven by [small subset of data]?” “Wouldn’t it be more rigorous to do [alternative analysis]?” That’s why, in some fields (economics is one), it’s routine for papers to include “robustness checks”, also known as “alternative specifications”. You do the analysis in a bunch of different ways, and show that they all lead to the same conclusion. Robustness checks aren’t routine in ecology. Should they be?

Maybe not. After all, one could take the view that the whole point of a scientific paper is to tell one story. A scientific paper shouldn’t be a Choose Your Own Adventure. It’s the authors’ responsibility, and privilege, to argue for their scientific conclusion however they think best. Plus, the authors can’t possibly anticipate all the alternative ways in which readers might’ve wanted them to analyze the data. So if, as a reviewer or reader, you wonder how a different story would’ve turned out, well, that’s your problem. Go download the data (which, these days, the authors are probably required to share on a public repository), and conduct your own robustness checks.

And if robustness checks should be routine, where do they belong? In economics, they go in the main text of the paper. Which some economists complain about, because it makes the paper more difficult and boring to read. Alternatively, one could put robustness checks into an online appendix. But we all know that nobody reads online appendices–often even the reviewers don’t.* A third option is to not write up the robustness checks, but instead share code that will allow any curious readers to run the robustness checks if they want to. A fourth option is just to ask readers to trust you. Your paper can describe the alternative analyses you ran, and then say “These alternative analyses (not shown) led to the same conclusions as the main analysis, indicating that the results are robust.”

Which option do you usually take as an author? And which one do you usually prefer as a reader? Take the poll!

*Heck, once in a while you can put robustness checks in the main text of the paper and some readers will still overlook them.

5 thoughts on “Should your paper include alternative analyses as “robustness checks”? If so, where? Take a quick poll!

  1. I’m a big fan of robustness checks. One downside is that it turns into p-hacking (running a bunch of analyses until some come out true), but I mostly think that one is overblown. If somebody reports a bunch of results that all point the same way, that is sort of the opposite of p-hacking in my mind (hunting for a unique outcome). And as to whether it is an invitation to run an analysis lots of different ways and bury some, at some point you have to trust the author to report what they did honestly.

    I think the bigger risk is that you conscientiously decide a priori to run a model say 3 different ways that all seem like good ways and then they come out with different results – not just qualitatively slightly different which is nearly certain to happen, but big differences (positive vs negative slope, very significant vs not at all significant). This is a risk because in my mind the author is obligated to report all of those contradictory results when they write up. First off I would point out this is a risk to the author but not to science or the reader, so in that sense it is may not a risk we should worry about as much. But it is pretty rare for this to happen in my experience. And when it has happened to me, it has forced me to dig into my data to figure out why and I’ve always discovered things I didn’t know about my data, often important and interesting things. So to my mind its more a risk to the author of things taking longer, but not of “reducing” the quality of science – on the contrary it can increase the quality of science.

    And I voted to put them in supplemental material (although with a sentence that mentions it as an alternative robustness check in the main text). Wish more people had a robustness check philosophy, but in my experience as a reviewer, authors are very resistant to them.

    • “Wish more people had a robustness check philosophy, but in my experience as a reviewer, authors are very resistant to them.”

      Now I’m wishing I’d asked a third poll question: as an author, do you include robustness checks only because reviewers demand them (or because you *think* reviewers will demand them)? Or do you include them of your own free will?

    • Totally agree. If you try the analysis multiple reasonable ways and get the same qualitative answer, best to tell the reader (without cluttering the main text of the story). And if you get different answers, then you have learned that something is wrong in your assumptions and you have more work to do. I voted for SI which is what I usually do, but sometimes I just say “the results are qualitatively similar if you use [other reasonable approach]”

      • I should’ve said earlier that I agree with Andrew and Brian. If different reasonable-seeming analyses come out different ways, that’s a sign that there’s something about your data that you don’t understand. You need to figure out what it is and interpret your data accordingly. Not just pick whichever analysis gives you the answer you want, or that you think the reviewers will want.

  2. Pingback: Ecologists like robustness checks–especially if other people do them | Dynamic Ecology

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.