Friday links: the need for replication, how artists and scientists present their work, and more (much more!)

Lots of good stuff this week! C’mon, click all the links! Like you have anything better to do on a Friday afternoon. 🙂

From Jeremy:

John Taylor discusses the need for replication in economics. He notes that way back in 1986, Dewald et al. published a devastating article in which they found that serious mistakes were common in empirical economics articles. In response, leading journals changed their policies to oblige authors to share data and code–but the policies were never enforced, and today a few lonely economists are still campaigning for change. Wonder what someone would find if they did a study like that of Dewald et al. today. I’ll probably have to keep wondering for a while, since even if data and code are shared, nobody has much incentive to try to replicate published analyses. Hence my proposal that grad students be assigned replication projects as part of their statistical course work.

Along the same lines, Retraction Watch suggests a Reproducibility Index for scientific journals. The idea is to grade journals on the fraction of their papers that stand up to scrutiny, for instance by calculating the percentage of citations to papers published in the journal that show replication vs. inability to reproduce results. Failures to replicate are infamous for not having much impact on the literature, even when they take the form of corrections or retractions. But perhaps people would pay more attention if failures to replicate were summarized in the form of an index? See the comment thread over there for much discussion of the feasibility and advisability of such an index.

And a few years ago, John Ioannidis and Thomas Trikalinos suggested a way to statistically test for excess statistically significant findings in research on any given topic. Lots of different factors–data dredging, selective reporting, data falsification, and many more–will result in an excess of statistically-significant results being reported, compared to the number that would be expected given the study designs used and the true effect size being studied. So if you have a bunch of studies of the same question (i.e. studies that could all be combined in the same meta-analysis), you can calculate the expected fraction that would give statistically-significant results under the null hypothesis of no bias towards statistically-significant findings, and compare that to the observed fraction. Of course, since many sources of bias in statistical significance tests will also bias the estimated effect sizes, you can repeat the analysis for a range of possible true effect sizes, to get a sense of how large the true effect would have to be to justify the observed frequency of statistically-significant results. The idea is related to but distinct from familiar tests for publication bias. The authors found numerous examples of significant bias towards significant results in the medical literature. Lots of ways you could go with this in future. For instance, you could ask if the excess of significant results tends to be greater for observational studies than for randomized controlled experiments. Note however that the approach is basically exploratory, and that there are many reasons why you might find an excess of statistically-significant results, some of them completely innocent (e.g., among-study heterogeneity in the true effect size).

Britt Koskella, who has guest-posted for us in the past, has now posted the reviews her latest paper received over at her own blog. She posted her responses to the reviews as well. I would think students in particular would be interested in reading this. Especially when you’re just starting out, you don’t have much first-hand experience with what peer review is like. Plus, it’s always challenging to be objective about one’s own work, and thus about the reviews of one’s own work. Reading the reviews someone else’s work has received increases your “sample size” for what peer review is like, and lets you experience the process as an objective outsider. Terry McGlynn at Small Pond Science also posts the reviews he receives. Terry suggests numerous reasons for doing it. I’ve previously expressed mild skepticism about some of those reasons. But I don’t see that publishing reviews will do any harm (Terry rightly downplays the risks, I think), so why not? (And see below for Meg’s comments)

Morgan Ernest on how to create a diverse speaker series. Short version: cast a wide net, be systematic and thoughtful rather than just inviting the first people who pop into your head, and be prepared to put in a lot of work (which will be well worth it).

On invisible mentors.

A rare retraction in ecology.

How do you tell the difference between research that’s too specialized, and research that’s not specialized enough (“just following the crowd”)? Terry McGlynn, Zen Faulkes, and I all have posts on this. Mine is presented as a joke, but in fact I’m mostly not joking.

Amusing article on scientists eating their study organisms. I have an old post on this, with a very entertaining comment thread. Judging by that thread, most ecologists choose their study organisms by asking themselves “What system should I work in in case I get hungry in the field and want to eat one of my samples?” 🙂 (HT Ed Yong)

Awesome gifs of science demonstrations. It’s a wonder any kids ever go into anything besides chemistry after watching stuff like this. 🙂 (HT Ed Yong)

This kind of thing is why I’m a lab ecologist. 🙂 (HT Ed Yong)

And finally, sex tips from nature, courtesy of The Onion. “Be the boss and use your pincers to drag your mate into a nitrogen-rich log.” 🙂

From Meg:

For those with NSF grants, this post at DEBrief regarding the new annual reports format will be really helpful. Seriously. Read it. I wish it was written before I battled through the new format, and it sounds like even more changes (excuse me, ‘improvements’) are in store. Filling out my reports this year was really frustrating and time-consuming. Hopefully that will get better with experience with the new system.

Also related to NSF: NSF will be moving to Alexandria, VA in 2017. (ht: Morgan Ernest)

Britt Koskella has posted reviews of her new paper in Current Biology, along with her responses. She was inspired by Rosie Redfield, and this is something others (including Terry McGlynn) have called for. Interestingly, on twitter, Terry said this is something he would only do post-tenure. Timothée Poisot has an interesting post in reply to Britt’s. Britt and Carl Boettinger have interesting comments on that post, too. Lots to think about!

Two entries in the “less-is-more for science writing” competition: Is this the best scientific abstract ever? It’s certainly to the point! (ht @JenLucPiquant via twitter) And, on a similar theme, this is a fantastic taxonomic note. As Morgan Jackson said on twitter, “The title is the data, the discussion is 1 sentence, and the acknowledgements are hilarious.”

Over at Tenure, She Wrote, there’s a post by drmsscientist on how enjoyable it’s been to be a first year faculty member. As she points out, there’s so much negative info out there about challenges for women in academia. (It does seem to be a bit of a theme for my Friday links!) But, as I’ve said in an earlier post, sometimes you need to ignore that and focus on the people who have done it (or are doing it).

That old post of mine that I just linked to (I’ll link to it again!) was motivated by some dispiriting posts on ecolog. Sadly, there’s been a bit of a renewal of that. This started after a post announcing Career-Life Balance supplements from NSF to support someone to keep a project moving along while a postdoc is on family leave. On twitter, there were lots of folks (myself included) saying that this could have really helped out X months/years ago. I think this is huge, and that it’s great that NSF is doing this. Unfortunately, the initial post to ecolog led to a reply saying this is “institutionalized discrimination” against singles. I’ve been wanting to write a post in reply but haven’t had time, so I’ll link to this post by CackleofRad instead. In my opinion, there’s been a huge difference in the responses this time. Compared to the Clara B Jones kerfuffle that motivated my earlier post, this time the responses have been much more supportive of women and parents in science. A nice change!

And, finally, here’s an article on why there are still so few women who are public intellectuals. It talks about women being more likely to decline requests to comment on something because it’s outside one’s area (which made me cringe a bit, because I’ve done this) and also talks about the potential for blogs to be platforms for women scholars. (ht: Jacquelyn Gill via twitter)

Hoisted from the comments

Meg’s post earlier this week on system envy and experimental failures has a very nice comment thread, kicked off by this wonderful comment from artist and lab tech Nancy Lowe on how scientists and artists present their results in different ways. And I make fun of Meg’s study organism, so there’s that. 🙂

In case you missed it, the comment thread on Britt Koskella’s old guest post on microcosm experiments is really great. Includes discussion of the different reasons one might do microcosm experiments (Britt and I do them for different reasons, for instance), why microcosm experiments seem to be more controversial in ecology than in evolution, what it means to “rig” an experiment and why that’s not always a bad thing, and more.

14 thoughts on “Friday links: the need for replication, how artists and scientists present their work, and more (much more!)

  1. Lots to chew here.
    1) I said this in an earlier comment. I’m not sure how much posting code would help solve replicability. Its really hard to go through my own old code and know what I did, much less someone else’s. Plus the way to find coding errors AND modeling errors (that are correctly coded) is to redo the work from the start. Use all the parameters from the paper but create the model equations and assumptions (maybe some/most are provided in the paper) and the code yourself. This would be insanely time consuming. Consider the two MD anderson statisticians who say they spent 3 years replicating Potti’s work on chemotherapy in cell lines. I think this is why grad students usually find these sorts of errors. The first step on a new project is to build up some code and test it against someone else’s data.
    2. On posting reviews. While I’m sympathetic with this, its unethical. I cannot post manuscripts I’ve reviewed online. The confidentiality should go both ways. In order to write a strong review, I might give up some knowledge that I don’t want to be public, at least at the moment. If I knew my review was going to be posted to the public, I would write a different review. Some of these changes might be better. For example, I would stop saying, you need to cite Walker 19xx) : )

    • Re: posting reviews, I guess my attitude has always been that authors have always talked to others about the reviews their papers got. Posting reviews online doesn’t seem to me to be qualitatively different. Yes, a review is a communication directed to the author. But just as with any communication to anyone (online or offline), unless you specifically say to someone “please don’t share this with anyone”, you have to expect that it might be shared. I think that’s always been the case. Especially since, as far as I know, most journals don’t have a policy preventing authors from sharing the reviews they receive. It’s just that people now share things online as well as in face to face conversations and via email. Again, just my own personal attitude.

      Again speaking purely personally, I wouldn’t write my reviews any differently if I thought the author might post them online. I wouldn’t be rude to an author whether or not I thought the review would be posted online! As far as signing, I sometimes sign my reviews and sometimes don’t. And of course I’m well aware that even if I don’t sign the author might guess my identity from the tone or content of my remarks. But the risk of having your identity guessed has always existed, it’s not something new that’s created by people posting reviews online. So whether I’ve signed or not, I don’t mind if an author puts my reviews online.

      But I’m sure this is one of those things on which different people are going to have different attitudes, which is absolutely fair enough. I certainly see where you’re coming from, even if I don’t come from the same place. Which is why I don’t think “unethical” is the right word to use–I think this is a context in which that word is out of place, or at least unhelpful. I think it’s more helpful for different people to talk about their own preferences and reasons for holding those preferences. Or perhaps to frame the discussion as one about what constitutes good professional practice. Framing the discussion as an ethical one, and stating flat out that some people are acting unethically, I think risks leading to a degeneration into personal attacks. It’s a very short step from calling an action “unethical” and calling the person who performed it an bad person. I’m sure that’s not an implication you meant at all, but it’s an implication that it can be hard to prevent others from drawing.

      In the comments on the posts we linked to there’s some good discussion of the range of views on this issue, and how authors ought to behave in light of the fact that there’s a range of views out there.

      • Jeremy: what is the editorial policy of the major journals and of NSF and NIH on the public nature of reviews? I never really thought about this while reviewing but I think my reviews unconsciously assume that ALL communication in the review process is private.

      • That’s a good question Jeff. As Ben notes in his comment, some journals have policies that at least seem to suggest that reviews are private communications that aren’t to be shared (some policies are kind of unclear). Other journals seem not to have any policy one way or the other. I’m not aware of any traditional journals (as opposed to new peer review/publishing reforms) that have policies explicitly allowing publishing of reviews by the authors receiving them, though I could be wrong. And I don’t know about the policies of funding bodies like NIH or NSF.

      • I wouldn’t really care if my reviews were posted online, but some journals do state that the reviews are confidential, for example, from Am Nat:
        “All the reviews, as with the manuscript itself, are to remain confidential.” Although this was in an email to me as a reviewer, not sure if it is also stated to the author. I searched my inbox for this exact phrase and it only came up for emails associated with me reviewing a paper rather than one I submitted. It could be that they just use another wording for this, or maybe authors are allowed to share reviews, but reviewers are not?
        I think I remember seeing something like this for other journals as well.

      • Telling a reviewer that the reviews are to kept confidential is entirely different than telling the author of the article being reviewed.

        I always imagined that the reviews, as they were anonymous, were the confidential property of the authors whose paper is being reviewed. After all, the topic of the review is the paper, not the anonymous reviewer.

        I’ve never received any kind of notification, as an author submitting a paper to a journal, that the reviews I receive are to be held in confidence. And, I’ve submitted to more journals than those in which I’ve been published 🙂

  2. Interesting to see which of the many links in this post are proving most popular so far:

    1: retraction in ecology
    2-4 (all running pretty close at the moment): my silly link on why I’m a lab ecologist, the link to Britt Koskella posting the reviews her paper received, and the one on invisible mentors
    5-6: Meg’s two links on brief scientific papers or abstracts
    7: sex tips from nature

    Links to old posts of ours also appear to be drawing modest interest. But few folks are clicking most of the other ones. Well, that’s why we throw lots of stuff out there, you never know what people will want to read about.

  3. A timely paper about reproducing results. I suspect many of the issues outlined in this paper carry over to research in Ecology/Evolution.
    A particularly interesting finding is that positive findings in low power studies typically overestimate the true effect size. This means that positive results from low power studies may not represent true effects.
    I’m all for replicating studies. I also think we need to start designing and conducting experiments with a priori high power. The easiest way to do this is increase replication.

    • Thanks for the link, skimmed it very quickly. Kind of jives with an old post from Mike the Mad Biologist to which I’ve linked before, arguing that funding agencies should throw more money at fewer projects so that funded projects can have massive sample sizes.

      Also interested to see the authors of that piece frame the issue as having an ethical dimension, on the grounds that unreliable research is “inefficient and wasteful”. Specifically, the suggestion is that concern for animal welfare leads neuroscientists to design underpowered studied using as few animals as possible. But since those studies don’t produce reliable scientific information, those animals are effectively giving their lives for nothing. Which, yeah, seems rather unethical when put like that. I don’t know much about animal welfare rules, since no one in my lab works with animals subject to the rules. I guess I’d naively thought that, at least within the scientific community, what was ethical in terms of animal research was mostly agreed, a few unusual cases aside (like great ape research). Apparently not…

      I’ll note in passing that some of the statistical advice the authors provide sounds pretty questionable to me. The notion that it’s fine to just stop an experiment whenever the Bayes factor reaches some specified level really sounds off to me. Fortunately, most of the other advice they give about stopping rules is correct.

      • anyone working with vertebrate animals has to get institutional approval in which all of the protocols are reviewed. An explicit part of the process is the justification for the number of animals and the only way to do this is from a power analysis. Of course to do a power analysis one needs pilot data. Obviously the estimate of the variance in a pilot study has some error but still, if all of this is done well, lack of power should not be a problem if everyone (the investigator, the animal care committee) is doing their job.

      • The paper notes this but seems to say that in practice power analytical guidelines often either aren’t followed properly or are ignored. I have no idea, of course, it’s totally not my area. I suppose one possibility is that people are following the guidelines, but are being systematically over-optimistic about how much power they need to reliably detect effects of a size they might reasonably expect to find.

      • The funding question is tricky one. I think there is better bang for buck in funding many smaller groups as opposed to a few larger groups. You talked about this in your previous post in regards to big labs being time limited and not fund limited. This trades off against having many small studies which may have low power (likely to produce misleading results) and few big studies with high power (likely to produce reliable results). Somewhere in the mix there is a balance of providing funds so that reliable research is done by as many groups as possible.

        Animal research is full of ethical questions, some tough some easy. The general consensus is to use as few animals as possible. In my opinion the ethical thing to do is conduct experiments that are likely to produce reliable results (animals don’t die in for no reason). That means using more animals in any given experiment so that reliable results are produced. Since these experiments have more replicates, it might mean that fewer experiments are conducted and lead to no change in the actual number of animals used by a given researcher. For example if a researcher is space limited and has 12 experimental units available, they can conduct two two factor experiments with n=3 for each treatment or one two factor experiment with n=6 for each treatment. Conducting one experiment instead of two leads to more reliable research without changing the ethical question regarding the number of animals used overall.

        I agree with you about the statistical advice for stopping experiments. In this particular field (neuroscience) I think the ethical advice is for stopping invasive experiments that cause sever distress/pain to animals (in some cases humans). In these cases the ethical thing to do is stop causing distress/pain when it becomes apparent the experiment is unlikely to yield positive results.

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.