Don’t introduce your paper by saying that many people have long been interested in the topic (UPATEDx2)

Scientific papers often start by noting that lots of people are interested in the topic: “Topic X is of wide interest in ecology”, or some similar phrase. Sometimes they also talk about changes over time in how many people are interested in the topic, for instance by writing “Topic X has long been of central interest in ecology” or “Much recent research in ecology considers topic Y”.

Please, please don’t do this.*

As a colleague of mine likes to say, your paper should tell the reader about biology, not biologists. That is, your paper should introduce the biological topic and explain why it’s interesting and important, not say that other people think the topic is interesting and important. No, not even if everyone since the dawn of time has thought the topic interesting and important. Science is not a popularity contest. If the topic really is interesting and important, then you should be able to explain why, in which case the fact that other people also think the topic is interesting and important is at best superfluous. And if the topic is not interesting and important, pointing out that lots of other people think it’s interesting and important just shows that lots of people care about boring and unimportant things. Or at best, that your topic is a bandwagon.

For instance, one line of research in my lab concerns spatial synchrony in population ecology. Populations of the same species separated by hundreds or even thousands of km often exhibit positively-correlated fluctuations in abundance. Which is frickin’ amazing when you think about it. (UPDATE: Judging from the comments, that last sentence is confusing readers. My bad. The important thing about synchrony is not that I personally think it’s amazing, or that many others do too. The important thing is that it’s a real phenomenon (not just noise), and that it’s unexplained.) It’s like “action at a distance” in physics–how can such widely-separated systems behave as if they’re somehow connected? Such mysterious behavior cries out for an explanation. That’s why spatial synchrony is worth studying.** Not because spatial synchrony has long been of interest in ecology, or because much recent research in ecology addresses spatial synchrony, or etc.

The difference here can be subtle. For instance, there’s ongoing disagreement over whether short-distance dispersal leading to phase-locking is a plausible explanation for the observed long-distance synchrony of population cycles in nature (as opposed to in theoretical models or tightly-controlled microcosms). Alternatively, though not mutually-exclusively, long-distance synchrony of population cycles might be due to the long-distance synchrony of weather fluctuations, known as the Moran effect. If I was writing a paper on spatial synchrony, I might refer to this ongoing disagreement and use it as motivation for my own work. But it’s important to be precise here, and cite the disagreement for the right reasons. The motivation for further work is that there’s an interesting biological question–the causes of long-distance synchrony of population cycles–that hasn’t yet been answered. Resolving disagreement among the people working on this question is not a good motivation for further work. The goal of science is to figure out how the world works, not to produce agreement among scientists as to how the world works. Those are two different things, although it can sometimes be difficult to tell the difference between them in practice (e.g., it’s hard to recognize if a question hasn’t been answered, if everyone working in the field thinks it’s been answered). So here, it would be better to say something like “There are two alternative, though not mutually exclusive, explanations for long-distance synchrony of population cycles…” Rather than “Ecologists disagree about the causes of long-distance synchrony of population cycles…” The former phrasing is better because it keeps the focus on biology, rather than on what biologists think about biology.

From my own experience, I can tell you that it’s hard to avoid slipping into talking about biologists rather than biology. You have to constantly guard against it, or at least I do. This is a good mental habit to get into. It makes you alert to bandwagons and zombie ideas, and so keeps you from jumping on them or falling for them.*** It also helps you develop the courage of your own convictions and the ability to articulate them. Writing about biologists rather than biology is a crutch. It’s something you do when you don’t really know–and I mean really know–why your topic is worth studying.

p.s. This advice applies to talks and posters too.

UPDATEx2: As noted in the comments, I’m not saying that you shouldn’t talk about the history of research on your topic. The whole comment thread is great, actually, you should read it. :-)

*Note that I’m sure I’ve done it myself, though I haven’t gone back and checked. We are all sinners.

**Well, I could and sometimes do wave my arms about the applied importance of synchronized disease or crop pest outbreaks and argue that my work will improve our ability to predict/manage/prevent those things. Which doesn’t make such arm waving a good thing. Again, we are all sinners.

***In general, I think graduate students in particular tend to overrate the importance of working on “hot” topics. At the risk of overgeneralizing from my own example, I am living proof that you don’t have to work on “hot” topics, or use popular approaches or systems, to have a career in ecology. Spatial synchrony for instance has never been an especially “hot” topic, protist microcosms have never been a popular study system (just the opposite, in fact), and hardly anybody even understands the Price equation. What’s important is that you work on a topic for good reasons that you can articulate. One of the hardest things to do for graduate students who want to go on in academia is to become familiar with the current state and history of their field, while retaining/gaining the ability to think critically and independently. Also while gaining/retaining the confidence that thinking critically and independently, rather than following the crowd, is actually good for their academic careers rather than bad. (Note that “thinking independently” is not at all the same as “not knowing or willfully ignoring what everyone else thinks”, and that “thinking critically” is not at all the same as “thinking everybody else is wrong about everything”. The foundation of independent and critical thought is a broad and deep grasp of previous thinking.)

Friday links: surviving science, the ultimate citation, why everything is broken, and more

Also this week: depressing news on gender balance in major scientific awards, when trainees go bad, the history of the passive voice, and more. Oh, and identify any insect with this one handy picture. :-)

From Meg:

While I was glad to read that funding to support the Keeling curve measurements for three more years has been secured, I was a surprised to read that it was in question in the first place.

12 (really 13) Guidelines for Surviving Science. These are great! #5 reminds me of a conversation I had with someone about choosing mentors and collaborators: Imagine a 2 x 2 grid where you have nice/not nice on one side and smart/not smart on the other. Aim for nice & smart. Avoid the quadrant of doom.

After learning that there were no women finalists for the second year in a row, two scientists resigned from the selection committee for the Canadian Science and Engineering Hall of Fame. A lack of women recipients of a prominent award is something I’ve written about before. And, just yesterday, NSF announced its newest Waterman Award winner. The streak is now at 12 consecutive male winners.

I enjoyed this post on steps towards cleaner, better-organized code. (ht: Nina Wale) Related to this, a suggestion a colleague recently gave me is to aim to go one step more elegant/refined than what you would have done on your own. That is, don’t have amazingly elegant code as your goal. But if, each time, you aim to go one step beyond where you can easily get, you’ll learn a lot and, over time, become pretty good at programming. I like that idea.

From Jeremy:

Emilio Bruna, EiC at Biotropica, seconds Brian’s view that honest mistakes happen in science, and that the important thing is to fix them rather than stigmatize anyone:

So please, if you find a mistake in one of your papers let us know. It’s ok, we can fix it.

Arjun Raj explains why everything–peer review, academia, software design, you name it–is “broken”.

Arthropod ecologist Chris Buddle is cited in the latest xkcd “What If”! There are even two jokes about him! Must…control…jealousy… :-) (In seriousness: congratulations Chris!)

Stephen Heard with the story behind his paper on whimsy, jokes, and beauty in scientific writing. Includes an interesting discussion of how the taboo on humor and beauty in scientific writing is maintained even though lots of people–maybe even most people!–disagree with the taboo. Oh, and see the comments, where Stephen answers the question, when did scientists stop writing in the first person (active voice) in favor of the third person (passive voice), and why?

Tenure, She Wrote on every PI’s nightmare (or one of them): when trainees go bad.

Simply Statistics agrees with my hypothesis on why your university has so many administrators and so much red tape: because you asked for it.

Journalist’s guide to insect identification. That’s pretty much how I do it. Definitely close enough for government work. In fact, I bet this is how entomologists do it too, because it’s not as if anyone’s ever going to look close enough to check them. :-) (ht Not Exactly Rocket Science)

Aww, penguins are so cute! Here, penguin, pengu–AAAAAHHHHH!!!11!1 :-) (ht Not Exactly Rocket Science)

What if coauthors disagree about what their ms should say? (UPDATED)

In a recent interview, Richard Lewontin talks about how he and the late Stephen Jay Gould came to write their famous polemic “The Spandrels of San Marco” (ht Small Pond Science). Basically, Lewontin says all the polemical bits were by Gould, and that he only wrote one non-polemical section. And he says Gould went too far in the polemical bits, taking unreasonably extreme positions. A few quotes from Lewontin, to give you the flavor:

Steve and I taught evolution together for years and in a sense we struggled in class constantly because Steve, in my view, was preoccupied with the desire to be considered a very original and great evolutionary theorist. So he would exaggerate and even caricature certain features, which are true but not the way you want to present them…He would fasten on a particular interesting aspect of the evolutionary process and then make it into a kind of rigid, almost vacuous rule…

Most of the Spandrels paper was written by Steve. There is a section in there, which one can easily pick out, where I discuss the various factors and forces of evolution…

This surprises me. Not for the gossip about Gould’s motivations–I’m not much interested in that–but because Lewontin is more or less admitting that he put his name on a paper that he didn’t entirely agree with. Which surprises me because my attitude is very different. I don’t let a paper go out with my name on it unless I agree with every word of it. I figure I’m an author of the whole paper, not just “my” bits of it.

To be clear, my concern here isn’t with the technical soundness of my coauthors’ work (which in some cases I couldn’t actually check even if I wanted to), or with different people writing different bits of an ms. It’s with whether my coauthors and I all agree on the interpretation and implications of our work, and what to do if we don’t.

I’ve been involved in collaborations in which we disagreed about interpretation, sometimes very seriously. But in the end every collaboration with which I’ve been involved has managed to write a paper everyone was happy with.

There are degrees of agreement and disagreement, of course. I’ve had collaborative papers that would’ve been slightly different if I’d been the sole author–there’d have been differences in emphasis, or some points would’ve been phrased differently. Perhaps that’s what’s going on in the case of “Spandrels”. Maybe Lewontin would’ve preferred different phrasing or more (i.e. any!) nuance, but he basically agreed with Gould’s main points so was happy to put his name on the paper.

One way to resolve disagreement among coauthors would be for them to lay out their disagreements in the ms. One occasionally sees papers like this, but only from “adversarial” collaborations between intellectual opponents. There’s no reason in principle why friendly collaborators who only partially disagree couldn’t do the same thing, but I’ve never seen it done. (UPDATE: They say the memory is the first thing to go. Andy Gonzalez comments to remind me that he and Andrew Hendry have a friendly disagreement about the prevalence of local adaptation. They wrote a dialectical paper about it. And see the comments for other examples of adversarial collaborations in which intellectual opponents wrote joint papers clarifying their areas of agreement and disagreement.)

The various meanings of “authorship”, and different standards for authorship, are relevant here (see this old post). If you think of an “author” just as “someone who made a substantial contribution to the work reported in the ms”, then maybe you don’t assume that every author necessarily agrees, or should agree, with everything in the ms. The author list is just a list of people who contributed in various ways to producing various bits of the ms. Not a list of people who agree with everything the ms says.

I’m guessing this is an issue on which folks have very different experiences and views. So here’s a little poll. Do you think coauthors should agree on everything their ms says?

Is it really that important to prevent and correct one-off honest errors in scientific papers?

Wanted to highlight what I think has been a very useful discussion in the comments, because I know many readers don’t read the comments.

Yesterday, Brian noted that mistakes are inevitable in science (it’s a great post, BTW-go read it if you haven’t yet). Which raises the question of how hard to work to prevent mistakes, and correct them when they occur. After all, there’s no free lunch; opportunity costs are ubiquitous. Time, money, and effort you spend checking for and correcting errors is time, money, and effort you could spend doing something else.* I asked this question in the comments, and Brian quite sensibly replied that the more serious the consequences of an error, the more important it is to prevent it:

Certainly in the software engineering world it is widely recognized that it is a lot of work to eliminate errors and that there are trade-offs. If it is the program running a pace-maker it is expected to do just about everything to eliminate errors. But for more mundane programs (e.g. OS X, Word) it is recognized that perfection is too costly.

Which raises the sobering thought that the vast majority of errors in scientific papers aren’t worth putting any effort into detecting or correcting. At least, not any more effort than we already put in. From another comment of mine:

Yes, the consequences of an error must be key here. Which raises the sobering thought that most errors in scientific papers aren’t worth checking for or eliminating! After all, a substantial fraction of papers are never cited, and only a tiny fraction have any appreciable influence even on their own subfield or contribute in any appreciable way to any policy decision or other application.

xkcd once made fun of people who are determined to correct others who are “wrong on the internet” (https://xkcd.com/386/). It’s funny not just because it’s mostly futile to correct the errors of people who are wrong on the internet, but because it’s mostly not worth the effort to do so. [Maybe] most (not all!) one-off errors in scientific papers are like people who are “wrong on the internet”…

What worries me much more are systematic errors afflicting science as a whole, that arise even when individual scientists do their jobs well–zombie ideas and all that.

Curious to hear what folks think of this. Carl Boettiger has already chimed in in the comments, suggesting that my point here is the real argument for sharing data and code. The real reason for sharing data and code is not so that we can detect and correct isolated, one-off errors.** Rather, we share data and code because:

Arguing that individual researchers do more error checking than they already do is both counter to existing incentives and can only slow science down; sharing speeds things up. I love Brian’s thesis here that we need to acknowledge that humans make mistakes. Because publishing code or data makes it easier for others to discover mistakes, it is often cited in anonymous surveys as a major reason researchers don’t share; myself included. Most of this will still be ignored, just as most open source software projects are; but it helps ensure that the really interesting and significant ideas get worked over and refined and debugged into robust pillars of our discipline, and makes it harder for an idea to be both systemic and wrong.

I’m not sure I agree that sharing data and code makes it harder for an idea to be both systemic and wrong. The zombie ideas of which I’m aware in ecology didn’t establish themselves because of lack of data and code sharing. But I like Carl’s general line of thought, I think he’s asking the right questions.

*A small example from my own lab: We count protists live in water samples under a binocular microscope. Summer students who are learning this procedure invariably are very slow at first. They spend a loooong time looking at every sample, terrified of missing any protists that might be there. Which results in them spending lots of wasted time staring at samples that are either empty, or in which they already counted all the protists. Eventually, they learn to speed up, trading off a very slightly increased possibility of missing the occasional protist (a minor error that wouldn’t substantially alter our results) for the sake of counting many more samples. This allows us to conduct experiments with many more treatments and replicates than would otherwise be possible. Which of course guards against other sorts of errors–the errors you make by overinterpreting an experiment that lacks all the treatments you’d ideally want, and the errors you make because you lack statistical power. I think people often forget this–going out of your way to guard against one sort of error often increases the likelihood of other errors. Unfortunately, the same thing is true in other contexts.

**I wonder if a lot of the current push to share your data and code so that others can catch errors in your data and code is a case of looking under the streetlight. It’s now much easier than it used to be to share data and code, so we do more of it and come to care more about what we can accomplish by doing it. Which isn’t a bad thing; it’s a good thing on balance. But like any good thing it has its downsides.

Mistakes happen in science

Meg recently wrote a post acknowledging that crying in science was pretty common place. It really touched a nerve and went viral. Meg’s opening syllogism was masterful: humans cry, scientists are human, therefore scientists will cry.

I want to touch on an even more sensitive syllogism: humans make mistakes, scientists are human, therefore scientists will make mistakes. And a corollary – some mistakes will make it into print.

People obsessed with preserving a united front against science deniers might try to pretend this isn’t true. But it is true. This rarely acknowledged truth about scientists is fresh in everybody’s minds because of a recent retraction of an ecology paper (due to an honest mistake). I’m not even going to link to it since it is a distraction from my main point to single out one group of individuals when I’m talking about collective responsibility (but if its too distracting not to know Jeremy linked to it on Friday).

What I am finding revealing is not that a retraction occurred but other people’s reactions to the fact that a retraction occurred. There seems to be a lot of distancing and blaming. The first commentor on retraction watch even went one step further and very sloppily and inaccurately started throwing around the phrase “fraud scandal” (really? the topic of mistake is so taboo we can’t differentiate the profound difference between mistake and fraud?)

My reactions were rather different, going in order of occurrence, and probably progressively more profoundly were:

  1. Ouch – I feel bad for the authors
  2. I’m impressed with the way the authors handled this – it took a lot of courage
  3. That’s science working the way it is supposed to
  4. It could have been me

There’s no need to expand on the first one (except its worth noting I don’t know any of the author’s personally so this was more of a 1 degree removed member of my community form of empathy).

But I think it is worth dwelling on the second one for a moment. It must have been very tempting to bluster and deny that there were substantive enough mistakes to require a retraction and hoped this faded away. We all know this strategy has a decent shot at working. In an infamous case in evolution (UPDATE the link in Jeremy’s post is broken – follow this link), it worked for years until a co-author took it upon himself to self-publish and blow the whistle (nobody talks about this but the journals have an obvious interest in not highlighting a mistake). But these author’s didn’t weasel in any fashion. And they thought about the good of science before the good of their careers. Good for them!

As for the 3rd reaction – this is not a failure of science. It is a success of science! It is science working as it is supposed to. And it is exactly why science has a claim to a degree of rigor that other modes of thought don’t have. The reason my syllogism doesn’t eliminate science as a paragon of correctness is that – contrary to the popular view about lone geniuses  – science is not about individuals or single papers. It is about the community and the total body of evidence. One individual can be right, wrong, a crack-pot, a genius, mistaken, right for the wrong reasons, and etc. But the community as a whole (given time) checks each other and identifies wrong ideas and mistakes. The hive mind will get the important things right with some time. If you read the details, this is exactly what happened. Good for science!

The last reaction is the touchiest of all (it could have been me*). Of course I do not knowingly have any mistakes in print. But I could have a mistake out there I don’t know about. And I’ve caught some that came close. And I could make one in the future. Should I be thinking that? Should I be admitting that in a public blog? I sure hope your answer to both of these questions is yes. If I’m not asking the first quesiton (and admitting the possibility) how can I be putting my best effort into avoiding mistakes. The same for the community context. And I’m pretty sure any other honest scientist cannot say they are 100% sure they never had made a mistake and never will make a mistake. 95% sure – I hope so. Maybe even 99% sure. But 100% sure? I don’t trust you if that is what you claim. Every lab I’ve ever worked in or been close to (meaning dozens) have challenges and errors with data and coding and replicability of analysis. Most of them are discovered and fixed (or sadly prevent publication). But has anybody here ever run an analysis, gotten a particular t-statistic/p-value and written it up, and then run the analysis later and gotten a slightly different number and never been able to recreate the original? Anybody have one or two sample IDs that got lost in the shuffle and you don’t know what they are? These are admittedly small mistakes that probably didn’t change the outcome. But it is only a difference of degree. And I bet most of you know of bigger mistakes that almost got out the door.

I want to speak for a minute more specifically about coding. In this day and age nearly every paper has some coding behind it. It might just be an R script to run the analyses (and probably dropping some rows with incomplete data etc along the way). But it might be like the stuff that goes on in my lab including 1000+ line computer simulations and 1000+ line big data analysis. Software engineers have done a lot of formal analysis of coding errors. And to summarize a lot of literature, they are numerous and the best we can do is move asymptotically towards eliminating them. Getting rid of even 90-95% of the errors takes a lot of work..Even in highly structured anti-error environments like NASA or the medical field mistakes slip through (like the mis-transcribed formula that caused a rocket to crash). And science is anything but a highly-structured anti-error environment (and we shouldn’t be – our orientation is on innovation). In a future post, I will go through some of the tricks I use to validate and have faith in my code.. But that would be a distraction here (so you might want to save your comments on how you do it for that post too). The bottom line though is I know enough software engineering not to fool myself. I know there are errors in my code. I’ve caught a couple of one line mistakes that totally changed the results while I was in the middle of writing up my first draft.  I think and hope that the remaining errors are small. But I could be wrong. And if I am wrong and made a whopping mistake, I hope you find my mistake!

The software industry’s effort at studying errors was just mentioned. But the medical and airline industries have recently devoted a lot of attention to the topic of mistakes as well (their mistakes are often fatal).The Institute of Medicine released a report entitled “To Err is Hman” with this telling quote:

“.. the majority of medical errors do not result from individual recklessness or the actions of a particular group–this is not a “bad apple” problem. More commonly, errors are caused by faulty systems, processes, and conditions that lead people to make mistakes or fail to prevent them.”

Broad brushing the details, both medicine and the airlines have come to the conclusion that the best way to avoid mistakes are to 1) destroy the myth of infallibility, 2) eliminate the notion that raising the possibility of a mistake is offensive, 3) introduce a culture of regularly talking about the possibility of mistakes and analyzing mistakes made for lessons learned, and 4) make avoiding mistakes a collective group responsibility.

I think arguably science figured this all out a couple of hundred years ago. But it is worth making explicit again. And per #3 it is worth continuously re-evaluating how we’re doing. In particular we do #4 extremely well. We have peer review, post-publication review (which is stronger for prominent and surprising results), attempts at replication etc. We’re professional skeptics. We also do pretty well at #2; you expect and accept your work being criticized and picked apart (even if nobody enjoys it!). #1 is more of a mixed bag. I’ve heard a lot of “it could never happen in my lab” comments recently, which is exactly the myth of infallibility. And the same for #3 – I haven’t yet heard anybody say “I’m going to change X in my lab” in response to the recent incident. And more generally across #1-#4, I would suggest that coding is novel enough in ecology that we have not yet fully developed a robust set of community practices around preventing coding errors.

In conclusion, I am sure somebody is going to say I am glorifying mistakes in science. I’m not. Mistakes* are unfortunate and we all need to (and I think all do) put a lot of effort into avoiding them. But I sincerely believe there is no way to guarantee individual scientists do not make mistakes. At the same time, I also sincerely believe that a well constructed scientific community is robust enough to find and correct all important mistakes over time. Which means it really matters whether we respond to mistakes by finger pointing or examining our common culture and how to improve it. The later is the conversation I want to have.


*Probably important to reiterate here that I’m talking about mistakes, not fraud. Whole different kettle of fish. I presume most people can see that, which is why I am not belaboring it.

FYI: rejected mss often get the same referees when resubmitted to a different journal

Just an FYI: if your ms is rejected after review by one journal, and you resubmit to a different journal, it’s fairly common for it to go to some or all of the same referees that reviewed it for the first journal. I don’t know exactly how common (how could you ever get data on that?) But it’s not rare. For instance, just in the last few months I’ve twice reviewed an ms I’d previously reviewed for another journal.

Why do journal editors acting independently nevertheless often end up choosing some of the same referees for a given paper? In part because a minority of academics do a majority of the reviewing, so the pool of potential referees isn’t as big as you might think. At least not the in the eyes of editors who like to get reviews from referees whom they know from experience will agree to review and do a good job. In part because many editors use similar criteria for choosing referees. For instance, seeking reviews from leading experts on a topic–who often are few in number, even for topics that you might not think of as narrow or specialized. Many editors also like to seek reviews from people who’ve published on the topic recently, which often isn’t that many people. And in part because, if an ms heavily cites or discusses someone’s work, that someone likely will be asked to review it.*

This means that, as an author, you need to take the reviews of your rejected mss seriously and revise as needed before resubmitting to a different journal (even if only to clarify the ms and prevent misunderstandings you think the first referees had). Do not just resubmit a previously-reviewed ms to a different journal without revising, on the assumption that you got “unlucky” with your referees and that the “new” referees will like your ms. Because the “new” referees could well be the “old” referees–and nothing annoys referees more than authors who ignore their comments!

The flip side of that is that nothing pleases referees more than authors who take their comments seriously and revise appropriately. So do it!

*In case you’re wondering: no, a referee who’s reviewed your ms before for another journal is not “biased” and does not have a “conflict of interest”. Not even if the previous review was negative. See this old post for discussion. Although journals that have double-blind review may avoid referees who’ve learned the authors’ identities by reviewing the ms previously for a journal lacking double-blind review (I’m not sure on this).

Friday links: other people hate you (and that’s ok), R should be optional, RABBITS, and more

Also this week: sleep vs. you, Tony Ives vs. statistical machismo, tips for gender-balancing your seminar series, the origin of deanlets, a rare retraction in ecology, why ecologists and evolutionary biologists give good talks, and more. Lots of good stuff this week!

From Meg:

Improve the gender balance at your conference (or in your department’s seminar series) using these four simple, straightforward tips. One thing they suggest is that it can be helpful to have a list of names of women in a particular field. I have parked the domain ecoevowomen.wordpress.com for this purpose, but haven’t done anything with it. (I got the idea to park that domain based on Anne’s List, which highlights women neuroscientists.)

I am so glad to read that I’m not the only person who gets anxious when receiving vague email requests to meet. There is no surer way to get my anxiety up than to send an email saying, “Can we chat some time tomorrow?” with no indication of what the email is about. And I don’t just worry if it’s from a boss-like figure. It’s also true when I get a vague email from a collaborator, colleague, or lab member.

I also enjoyed this post by PsycGirl, who has the helpful reminder that people will hate you. This is something that is hard for me (and, based on the twitter discussion, apparently for many others, too), but that I’ve been working on. One thing that I remind myself of is that other people will disagree fundamentally with some of my values (just as I will disagree with theirs). If we didn’t disagree, something would be wrong.

From Jeremy:

Tim Poisot says reviewers shouldn’t enforce R as a standard (or even enforce open source software, he might have added). Tim’s right. And as Ethan White notes in the comments, if you disagree then what you’re really saying is that R shouldn’t exist because everyone should’ve just stuck with adding machines whatever everybody used before adding machines SAS.

A rare retraction in ecology: a high-profile paper in Global Change Biology, which found that plants migrate to lower elevations in response to global warming, has been retracted because of a coding error. The authors were careful about double-checking their results but nevertheless missed the error, which happens. And as soon as the error was discovered, the authors did the right thing and retracted, for which they deserve kudos. So I’m not sure why the reviewer who discovered the error sounds snarky about it. Anybody can make an honest mistake, and it’s bad for both individual scientists and science as a whole to pretend otherwise. More on this from Brian in a forthcoming post.

Tony Ives shows that it’s fine to just log-transform your count data and use least-squares linear models for null hypothesis tests. You only sacrifice a little bit of power vs. a properly-specified generalized linear model or generalized linear mixed model, and your inferences about the null hypothesis will be much more robust to model mis-specification, preventing inflation of the type I error rate. Tony 1, statistical machismo 0. I’m curious if Tony was prompted to write this paper because he ran into reviewers insisting on generalized linear models in a context where transforming the data and then doing least squares was just fine. (ht Meg)

The NSF Biological Sciences directorate has a blog now. (ht Terry McGlynn, via Twitter)

A hypothesis on why your university has so many vice-deans and other administrative layers, and so much red tape. I’ll add my own hypothesis (based on no evidence or even anecdata): faculty don’t like having to do administrative tasks, so ask for more administrative support. Which results in more administrators getting hired. Which then generates more admin work for everyone. And so the cycle repeats. This isn’t a criticism of individual administrators, the large majority of whom are competent and hardworking. It’s a hypothesis about the dynamics of the whole system.

Speaking of why universities are the way they are…How come any list of the top universities in most any western country has about the same rank ordering as it had a century ago? Whereas most of the biggest corporations from a century ago no longer exist? And what are the implications for university management and national higher education policy? Interesting discussion that I’m still mulling over.

Simply Statistics on the curse blessing of dimensionality. That is, why it can actually be useful to have many variables that were measured on only a few subjects.

I’ve always had the impression that EEB folks give pretty good talks on average, because EEB folks know to foreground the big questions and general concepts. So I was interested to read that system biologist Arjun Raj thinks that too many cell/molecular/biochemical talks lack big questions and general concepts.

This is old but I missed it at the time: Here’s a video of Sarah Hird (of Nothing in Biology Makes Sense! fame) giving a hilarious BAHFest talk on why mammals sleep. It’s only 7 minutes, click through! :-)

Aww, isn’t that a cute bunny rab…OH GOD RUN FOR YOUR LIFE!!!11! :-) (ht Marginal Revolution)

And finally: the most 1970s thing you’ll read this week. I give you the US Forest Service on how to make a cocktail. Diagrams and all! Yes, really. :-)

Peter Abrams on ratio-dependent predation as a zombie idea

Peter Abrams has a paper in press at Biological Reviews criticizing the idea of ratio-dependent predation. Briefly, this is the idea that the feeding rate of predator individuals should be modeled as a function of the ratio of prey and predator densities, rather than as a function of prey density only, or as some other function of predator and prey densities. Peter has had a long-running, fundamental disagreement with the advocates of ratio-dependent predation, Roger Arditi and Lev Ginzburg. They laid out the points on which they agreed, and those on which they agreed to disagree, in a joint review paper (Abrams and Ginzburg 2000). But Arditi and Ginzburg’s 2012 book on ratio-dependence, to which Abrams’ paper is a response, seems to have revived the argument.

I don’t usually post on individual ecology papers. But I can hardly avoid it in this case, because Peter’s paper starts with this quote:

Ideas, once they take root, are hard to kill.…they
persist not just in spite of a single inconvenient fact, but
in spite of repeated theoretical refutations and whole
piles of contrary facts. They are not truly alive—because
they are not true—but neither are they dead. They are
undead. They are zombie ideas.

Yes, you read that right: Peter’s quoting my original blog post on zombie ideas in ecology. He uses it as a framing device for the entire paper, closing with another quote from the same post, and citing the post in the references. So while I won’t comment on the dispute over ratio-dependent predation*, I do want to comment on the roles of blogs and rhetoric here.

Obviously, it’s very flattering that someone I really respect would take a blog post of mine seriously enough to cite it. I hope that in future this will become more common. Blog posts aren’t substitutes or replacements for peer reviewed papers. Peer-reviewed papers have long been and will continue to be the most rigorous form of scholarly communication in science, and rightly so. But blog posts can be substantive scholarly contributions too, and so it’s appropriate to recognize and treat them as such–including by citing them.

Of course, it’s not a conventional citation–Peter’s citing me as a source for rhetoric rather than for, say, an empirical claim. I have mixed feelings about Peter’s rhetoric here.

On the one hand, I really do think that lots of good scientists can end up believing in things they shouldn’t, and that whole fields of scientific inquiry can get stuck and fail to progress for far too long, without the field even realizing that it’s stuck. That’s a serious problem when it happens. Recognizing the problem is the first step to fixing it, and to preventing it from happening in future. But such widespread problems often go unrecognized, precisely because they are widespread. The point of the “zombie ideas” rhetoric is to use a silly slogan to call attention to a serious problem that otherwise might go unrecognized. So I’m happy to see the “zombie ideas” meme spreading; hopefully it means people are worrying about this issue.

On the other hand, I disagree with Peter that ratio-dependent predation is a zombie idea, or if it is I’m not sure it’s one that should much worry those who disagree with it. It’s true that it’s an idea that’s persisted despite repeated criticisms.* But as far as I can tell it’s not a widely-held idea. According to Web of Knowledge, there hasn’t been a paper with “ratio-dependent” or variations thereon in the title published in Eco Letts, Ecology, Ecol Monogr, Am Nat, Oikos, JAE, Proc Roy Soc B, Oecologia, Ecological Applications, Journal of Applied Ecology, Journal of Ecology, Ecosphere, or any other empirical journal beginning with “Ecol” since 2006, except for one in 2012 in Ecosphere that had Roger Arditi as a co-author. As far as I can tell, most papers on ratio-dependent predation these days come from a relatively small number of theoreticians and appear in theory journals like Ecological Modelling and Ecological Complexity. Nor could I find any talks or posters on ratio-dependence in the programs for the last three ESA meetings (I didn’t look any further back). Yes, as Peter notes, papers advocating ratio-dependence get cited more often than critiques of the idea–but I suspect that’s because theoreticians interested in the mathematical properties of ratio-dependent models don’t see any reason to cite the critiques. And yes, ratio-dependence is mentioned in a few textbooks.** But on balance, rather than being a zombie idea, ratio-dependent predation looks to me like an idea that persists thanks to the efforts of a few dedicated champions.*** Who aren’t likely to be swayed by repetition of familiar criticisms of ratio-dependent predation (any more than critics of ratio-dependent predation are likely to be swayed by familiar arguments for it).

I also worry that the use of “zombie ideas” rhetoric runs the risk of making scientific debates overheated and personal. Criticism of ideas is hard to do without it getting personal, and rhetoric probably makes it even harder to keep things from getting personal. Not that I think hurt feelings are to be avoided or minimized at all costs–I don’t; criticism of ideas is too important. And so even though I’m sure rhetoric like “zombie ideas” has and will upset some people, I think it has its place, as an entertaining way to spark debate and call attention to an important “failure mode” of science as a whole. I’m reassured that others agree. But I don’t think its place is as an aid to criticism of individual scientists. Looking back, I wasn’t sufficiently clear about this in my early zombie ideas posts. I’m sure that to some readers it looked like I was personally criticizing individual scientists who hold “zombie ideas”, which wasn’t my intent at all.**** So while I’d like to see more debate about zombie ideas in ecology, I want that debate to be productive rather than unproductive, and debates that get personal tend to become unproductive.*****

I also worry a little that peer-reviewed papers aren’t the most natural home for zombie ideas rhetoric. Not long after writing my original zombie ideas post, I submitted it a lightly edited version of it to a peer-reviewed journal, zombie jokes and all. It got rejected; the referees and the editor didn’t think the rhetoric was appropriate for a peer-reviewed paper. At the time, I disagreed with them on this, while appreciating where they were coming from. But looking back, I now kind of agree. I think the version of that paper that I eventually published is a much stronger paper, and remains provocative without (hopefully!) being so provocative as to offend anyone or cause them to dismiss my substantive points out of hand. One role of blogs as a form of scholarly communication is to give a home to rhetoric and other stylistic flourishes that might be out of place in a scientific paper. Readers who might balk at certain writing styles in peer-reviewed papers are fine with them on a blog. Much as we don’t expect, say, scientific talks to necessarily be delivered in the same dry, formal style as a peer-reviewed paper. Formal Oxford-style debates can serve the same role–they’re a venue in which people are supposed to disagree as strongly and entertainingly as possible. But then again, I’m not sure of this. You can’t discover–or change–what presentation styles are acceptable to the audience except by trying out non-standard styles. Some of my favorite peer-reviewed papers use humor and rhetoric (e.g., Ellstrand 1983), and that humor and rhetoric is effective precisely because the reader isn’t expecting it.****** And there’s no pleasing everyone when it comes to presentation style, no matter what the venue (e.g., see the comments on this post), so there are always going to be tough stylistic judgement calls to be made.

In an old post, I suggested some rules of thumb about when rhetoric like “zombie ideas” is appropriate in scientific writing. I don’t have much to add to that post, except the point that rhetoric goes down easier with the audience in certain venues. But my thoughts are tentative (as I’m sure you can tell from the wishy-washy, but-on-the-other-hand nature of this post). The appropriate use of rhetoric and humor in scientific writing is something on which scientists have disagreed for centuries. It’s hard to give advice on the right thing to do when there’s disagreement on the right thing to do.

*I don’t think anyone has anything new to say on that; I certainly don’t. For instance, see Arditi and Ginzburg 2014, where they respond to Barraquand 2014 by referring readers to their book, and to Abrams and Ginzburg 2000, without elaboration.

**None of this “prevalence data” shows that ratio-dependence is flawed, of course. Or shows that it’s not flawed. Science isn’t a popularity contest.

***Ironically, I could imagine Roger Arditi and Lev Ginzburg referring to predator-prey models with prey-dependent functional responses as a zombie idea. I personally wouldn’t agree if they did so, since I think there are good empirical and theoretical reasons to study prey-dependent models. But if they did so, they’d at least be attaching the “zombie” label to an idea that’s widely-held. That both sides in the ratio-dependence debate can make a case that the other side is in the grip of a zombie idea illustrates the slipperiness of rhetoric, and the inability of rhetoric to function on its own as a substitute for substantive argument.

****It probably didn’t help that I stole the term “zombie ideas” from economists who are known for no-holds-barred rhetoric that includes attacks on the competence of their opponents, deployed in the hopes of causing others to stop taking their opponents seriously.

*****I’ve made this point before. Those who don’t care at all whether scientific debates get personal, or who think they should get personal, are making the mistake of writing for the audience they wish they had rather than the audience they actually have.

******Humor is really useful. Ellstrand (1983) basically makes the same point as Gould and Lewontin’s “Spandrels of San Marco”, but while I love Ellstrand’s paper I find “Spandrels” really annoying. I think that’s because Ellstrand’s style is funny and playful, while Gould and Lewontin’s is dead-serious. Ellstrand’s style is unexpected, but in a way that gets the reader’s attention without putting the reader off. Similarly, I think you can write about “zombie ideas” in a funny way, or a dead-serious way. Much as zombie movies can be funny and include themselves in the joke rather than being dead serious. Think Shaun of the Dead or Army of Darkness. I try to be funny about zombie ideas, but I leave it others to judge how successful I’ve been.

Collecting, storing, analyzing, and publishing lab data

Lately, I’ve been trying to figure out how I might change the way my lab collects and handles data and carries out analyses. I think we’re doing an okay job, but I know we could be doing better –I’m just not exactly sure of how! A key goal of this post is to get ideas from others to try to figure out approaches that might work for my lab. Please comment if you have ideas!

First, to describe the general way in which we collect, store, analyze, and publish data at present:

Data collection: Data collection is always done in a notebook or on a data sheet with pencil. I know that officially pencil is a no-no for lab notebooks, but this is how I was trained, with the argument that, for a lab that works with lots and lots of water, pencil is a safer way of recording things than pen. (Even with this general guideline to collect data in pencil, I remove any pen lying around the lab that looks like it has ink that wouldn’t hold up to a spill.) In terms of notebooks, we used to have notebooks for individual people, but it was becoming a mess in terms of collaborative projects, where one day’s data collection would be in one notebook and the next day’s in another. So, we’ve moved to a system where there are lab notebooks for each individual project. For certain types of data (especially life tables), we record the data on data sheets. The upside is that data collection is much more efficient with data sheets. The downside is that they are so much easier to misplace, which is a source of anxiety for me. These get collected in binders or, if there are only one or two of them, taped into a lab notebook. And I emphasize that they should get scanned as soon as possible. I tend to take a photo of the data sheets at the end of the day with my phone just to be on the safe side.

Data entry: Data are then entered into excel and proofed to make sure there weren’t data entry errors. This means that data ends up being stored as a hard copy and in excel files. Any files on the lab computer or my computers get automatically backed up to the cloud. Writing this post is making me realize that I don’t know how my lab folks back up their computers, and that that is something I should ask them about!

Data analysis: For the most part, we now carry out analyses in R. We do some amount of sharing of code with each other, but different people mostly are coding on their own. We don’t have a central lab repository for code, but I did recently start an Evernote notebook where we can keep track of code for basic things that we need to do frequently (but perhaps not quite frequently enough to remember quickly!) I meet with students as they work out new analyses to talk about things like error structure, model terms, etc., but they are writing the code themselves, for the most part.

Data publishing: We eventually create a folder where we have all the data files and code for the analyses in a given paper. We used to not gather them all together as coherently as we do now. Then, we moved to a point where we would do this once the paper was accepted. Now, I do this as soon as I start drafting the manuscript. In my opinion a great benefit of the move towards publishing data along with papers has been that it provides all scientists with extra incentive to have their data files and code in a relatively easily retrievable form.

Okay, so that explains what we’re currently doing. Now, to move on to things I’m considering changing or that I think could be improved:

Data collection: I’ve been considering whether we should move to electronic notebooks (most likely Evernote, since we’ve been using that for other lab things). I think the biggest benefits would be that:

  • people would probably write out more detail about lab experiments, because typing is faster than writing (though see concerns below),
  • I could more easily keep an eye on what is going on in terms of data collection in the lab (which feels a little big brother, but we’ve all had experiences like the “WHERE IS THE NOTEBOOK AND WHY AREN’T YOU WRITING MORE THINGS IN IT?” one relayed in this comment), and
  • it would be easily searchable. This seems especially key as we have more and more projects in the lab. Right now, it can be hard for me to go back and find specific information (e.g., the temperature of a PCR reaction run in 2009 or the temperature at which we grew rotifers for a particular life table in 2010)*, especially if it is from before the point where we switched to project-based lab notebooks.

The downsides to this approach that I worry about are:

  • whether people will tend to think “Oh, I’ll just fill in these details about experiment setup when I get home, after eating some dinner” and end not writing as much (or, worse, forgetting to come back to it entirely),
  • sometimes drawing diagrams is handy, and this would be harder (but could probably be solved by quickly uploading a cell phone photo of a sketch)
  • something about not having a hard copy of data feels weird to me. (I realize that’s not the most scientific reasoning!)

I would love to hear from people who’ve tried out electronic lab notebooks to hear their experiences!

Data entry: I’m not sure that there’s a better way to do this, though I suppose there could be some fancy way of taking data directly from an electronic lab notebook and getting it into a data file. But I don’t anticipate us moving away from the general approach of typing everything into excel and proofing it.

Data analysis: Two things I’ve started emphasizing to my lab are the Software Carpentry mantra of “Your primary collaborator is yourself 6 months from now, and your past self doesn’t answer email” and that other people will eventually be looking through their data and code, so they need to make sure things can be understood by someone else (and that they won’t be embarrassed by the state of their files!) I think these are both really helpful.

The main things about data analysis that I want to change are:

  • to get a better culture of different people checking out other people’s data and code to look for errors, and
  • to not have everyone reinventing the wheel in terms of analyses.

I’ve heard that some labs have set scripts that everyone uses for certain tasks. This sounds great in principle, but I have no idea how to implement it in practice. I feel like every specific analysis we do is different (say, in the error structure we need or whether we care about the interactions or whatever), though I imagine that is true of pretty much everyone. I would love to get ideas from others on how they handle this!

What does your lab do?

 

What do you think would be ideal?

 

And finally:

 

Data publishing: I think I probably need to spend more time figuring out github and see if that would help with the data publishing process. And I’ll continue with my plan to always publish code and data with publications. In most cases, I think this will be done as either an appendix/supplement to the paper or via something like Data Dryad or FigShare. As I said above, I really like this approach in part because it helps emphasize the importance of making sure the data and code are saved in a way that they can be accessed and understood by others well into the future. What is your preferred way of publishing data and code? Am I totally missing an aspect of data publishing that I should be considering?

As I said at the beginning, I’d love to hear how other labs handle issues related to data collection, storage, analysis, and publishing. What works? What doesn’t work? And how did you make the shift to a new system?

 

*In my experience, the easiest way to find this information is to go to the end-of-semester write-up by the undergrad who worked on the project. They are the best at including all the nitty gritty details in their write-ups!

We need a simple app that would make blog posts more citable (UPDATED)

Back when I was at the Oikos Blog, I discussed emerging standards for citing blog posts.* Of which I heartily approve. The purpose of a citation is to give credit where it’s due, and if credit is due to a blog post, that’s where it should go.

Via email, a correspondent suggests a really handy idea that would make blog posts more citable. There should be a button at the end of the post that would automatically download the citation information for the post to your reference management software. Like the buttons for sharing the post on social media.

Unfortunately, I don’t know if such a thing exists (anyone know?) And even if it does we couldn’t implement it because our blog is hosted for free by WordPress and so isn’t infinitely customizable.

UPDATE: See the comments; there are at least a couple of systems for doing this.

*Yes, people are doing this, including in ecology. Currently working on a post discussing a striking example…