Writing a response to reviewer comments

Part of the process of publishing a manuscript is revising the manuscript in response to reviewer comments. Assuming you are resubmitting to the same journal*, you include a cover letter and a detailed response to reviewers. With these, the key message you want to send is: my coauthors and I take your feedback seriously, and we have thought carefully about the suggestions made by reviewers.

A few general points: Making snarky comments in response to a reviewer is ill-advised. ;) As Stephen Heard says in that post, you should keep in mind that the reviewers are likely to read your replies. They’ve devoted their time to reading and commenting on your manuscript; you should be gracious. As a reviewer, I try hard to be thorough, reasonable, and timely. So when the response to reviewers seems combative, that’s frustrating. A combative tone in a response to reviewers shouldn’t affect the recommendation or decision, but it sure doesn’t make the reviewer feel his/her efforts are appreciated.

Okay, now on to more specifics. Things that I generally include in the cover letter are:

  • a brief discussion of any major comments that came up;
  • a statement thanking the reviewers for their feedback (usually saying that the manuscript has been improved by incorporating that feedback, because this is almost always true);
  • a note saying that detailed responses are in the “Response to Reviewers” section;
  • a statement thanking the editor for considering the revised version.

If the revisions were pretty straightforward, the cover letter is short. However, if it seems, based on the initial decision letter, that the manuscript really needed to change in order to clear the bar for publication, the cover letter is longer, and should include a paragraph laying out the major changes that were made in response to reviewers.

I work on the “Response to Reviewer” document at the same time as I edit the manuscript. I find it helpful to have a pdf of the original submission open at the same time, to figure out what lines the reviewer is referring to (since those will shift around in the revision). This can make it tricky to work on this on my laptop! In the Response to Reviewer document, you should paste all the comments from the reviewers, and then put your responses immediately below each one.** Sometimes the responses will be short (even just one word, such as “Changed” or “Done”, when in response to a suggested wording change). Sometimes the responses will be longer.

One type of longer response is explaining how the manuscript was changed in response to a reviewer’s comment or question. For example, if a reviewer asked for clarification regarding a point, it’s good to give a brief clarification in the response to reviewer document, and then to explain how the manuscript was changed in response to the reviewer comment. As an example, if the reviewer’s comment was “Line 193: More information is needed on how birth rates were calculated. Did you use the Paloheimo method?”, in the response, you could write “Yes, birth rates were calculated according to Paloheimo. The manuscript has been edited to make this more clear. (Note that this section is now on lines 200-205.)” As another example, a reviewer might have suggested reframing part of the introduction. In the response to reviewers, you would then explain how you had done that.

Another type of response – and one that I argue should be used sparingly – is one where you say that you did not do something a reviewer suggested and explain why. Yes, some reviews are really bad. Fortunately, though, they are rare in my experience. If you have the misfortune of receiving such a review, hopefully the editor handling the paper gives you a guide about what parts of the review to focus on. Assuming the review is one that is reasonable overall, but where you disagree with a particular suggestion: when saying that you did not make the suggested change, you obviously should explain why you are not making that change.*** (I wonder how often Brian’s post on statistical machismo gets cited in response to a reviewer’s request to add in some fancy stats.) As a few examples of where I’ve done this:

  • a reviewer suggested I remove an experiment that I thought was a key part of the story;
  • a reviewer suggested adding in a different kind of analysis, but we lacked the sample size to pull off the analysis reasonably;
  • a reviewer suggested adding in discussion related to topic X, but that felt too far afield and speculative.

In the first, case, we explained why we thought it was important to include, but explained to the editor that we could remove it if he agreed with the reviewer that it should be removed. In the last case, we gave our speculation in the Response to Reviewers, explained that we felt adding it in would not be appropriate, and said that, if the editor disagreed, we could add it in. In all cases, in the response to reviewers, we explained why we were not making the suggested change. It’s okay to respectfully disagree. But if you find yourself fighting every suggestion the reviewer made, it’s worth considering whether you are not being sufficiently open to (hopefully constructive) criticism.

Another situation that often comes up when writing these responses is what to do when reviewers disagree. Sometimes, the reviewers are both suggested that something needs to be changed, but have different suggestions for what to change. This is a pretty clear indication that something isn’t working with that part of the manuscript. In these cases, I choose which one I think makes the most sense and explain why in the response to reviewers. When responding to the reviewer whose suggestions I did not take, I explain that the reviewers suggested different changes and that I changed this section in accordance with the other reviewers’ suggestion and give a brief explanation for why. In other cases, one reviewer disliked something and another reviewer liked it. (This is why it can be helpful to include a section noting strengths of the paper at the beginning of a review!) Assuming you agree with the reviewer who liked it, you can explain that in the response when saying that you did not make the suggested change.

Finally, a few very specific things that I think are important:

  1. Format your response so that it’s easy to read. (This is sometimes only possible if you can upload a pdf, since many online submission systems will remove formatting if you just paste it into a text box.) One thing that works well is to italicize all the reviewer’s comments and then not italicize your responses (or vice versa).
  2. Update the line numbers. Generally, reviewers use line numbers to refer to a section of text. Don’t change the line numbers they used, but, in your response, tell the reviewer and editor what the updated line numbers are. (I included this in one of the examples I gave above.) They should read the whole thing anyway, but it’s nice to be able to spot check a few things quickly, and line numbers help with that. And, yes, this is a major pain as you prepare the letter, since the line numbers change as you edit it. Making sure the new line numbers are all correct is generally the last thing I do before submitting the revised version.
  3. If it’s a substantive change (that is, not the sort of thing that can be addressed with a “done” or “changed” sort of response), you might want to paste the revised text into the cover letter, especially if it’s an important point. But this obviously makes even more for the editor and reviewers to read through, so there are arguments for doing this and for not doing it. But, as an associate editor, I find this helpful.

Coming back to what I said at the beginning: the key message you want to send is “my coauthors and I take your feedback seriously, and we have thought carefully about the suggestions made by reviewers.” The associate editor and reviewers are volunteering their time, and are trying to help strengthen the paper before it is published. Work with them.

* If you aren’t resubmitting to the same journal, you should still address the reviewer’s comments, but wouldn’t write detailed responses when submitting it to a new journal.

** Sometimes reviewers start out with a summary of the manuscript and/or a section with praise related to the manuscript. You can leave those sections out, leave them in but not have a response to them, or leave them in and just have a brief statement along the lines of “We are glad that the reviewer appreciated the study” or even just “Thank you”. The twitter consensus seemed to be for the last of those options.

*** “I didn’t feel like it.” is not a recommended reason to give. ;)

Poll: Should grant applicants be evaluated relative to how much funding they’ve received?

When I apply for an NSERC Discovery Grant, 1/3 of my evaluation score is based on my scientific productivity over the previous six years. NSERC calls this “excellence of the researcher”. Reviewers look at the quality, impact, and importance of my papers and other contributions to science. Many funding bodies do something similar, though the details vary.

NSERC instructs reviewers not to treat funding level as an indicator of excellence. You’re not supposed to infer that someone who has lots of funding is doing great science, or that someone who has little or none is doing weak science. But of course, funding is correlated with scientific productivity. No perfectly correlated or even linearly correlated, of course, but correlated. That’s the whole point of giving scientists money—so that they can produce more and better science! Which is exactly what any scientist will do, if given more money.

So here’s my question: when evaluating “excellence of the researcher”, should reviewers evaluate excellence relative to the amount of funding the researcher had? So that researchers who’ve been very productive—but also very well funded—would be evaluated less well than they would be if reviewers were just asked “how productive has the applicant been?”

I think there’s a strong argument that grant applicants should be evaluated relative to their previous funding level, although NSERC doesn’t provide any instruction to reviewers one way or the other.* Indeed, I know of at least one person who does this when reviewing NSERC Discovery Grants. But I don’t know how common it is. And it’s kind of difficult to do, for various reasons. For instance, how do you allow for differences in cost among different research approaches? How do you allow for the fact that probably every researcher’s productivity is some nonlinear, decelerating function of their funding, making it likely that researchers will less funding will be more productive on a per-dollar basis than researchers with more? And how do you allow for the fact that the height and shape of those nonlinear functions presumably varies among applicants? Although presumably such difficulties are mitigated in a system like NSERC’s, in which reviewers only make fairly coarse distinctions among applicants (scoring their “excellence” on a 6-point scale).

What do you think? As a conversation starter, here’s a little poll:

Looking forward to your comments.

UPDATE: I forgot it was a US holiday when I put this post up yesterday. #amateurhour So we didn’t get many votes. But of the 78 votes we got, 56% agreed that grant applicants should be judged relative to their previous funding level, 35% disagreed, and 9% weren’t sure. So based on this admittedly small and non-random sample, there’s a lot of disagreement on this issue!

*Do other funding agencies provide explicit instructions on this?

p.s. I’m intentionally not getting into the issue of whether funding agencies should look at applicants’ track records at all, or whether they should make use of that information in some different way than NSERC does (say, only asking if the applicant has the background and experience needed to carry out the proposed research). Those are interesting questions, but I’m setting them aside for purposes of this post.

Friday links: “new” Alan Turing paper, and more

Also this week: evidence that hindsight is indeed 20-20!

From Jeremy:

Stephen Heard on how he got into blogging. His Scientist Sees Squirrel is the best new ecology/science blog in a long while. If you’re not reading it–well, why the heck not? Interesting to hear that Stephen got into blogging through writing a book. I’m hoping to go in the opposite direction…

The ESA now has a formal code of conduct for their annual meeting.

Often in science, you’ll get a surprising result. It’s a much bigger effect than you were expecting, or in the opposite direction to what standard theory would predict, or whatever. And often, it’s possible to come up with a post-hoc explanation that makes the result seem unsurprising (or at least less surprising) in retrospect. But as Andrew Gelman reminds us, on its own that is a highly unreliable procedure. Because it’s very easy to come up with a plausible-seeming post-hoc explanation for anything. Even fake data that were designed to be surprising. And I’d add that, now that the data have been revealed to be fake, a lot of people are saying that someone should’ve recognized the fraud even before it was published. Which is unfair, and which provides a second illustration of Andrew Gelman’s point. Never forget: everything is obvious once you know the answer. I’ll also add that I disagree with a lesson Andrew seems to draw (not sure if he intends to draw it, though others have). Namely that one should have blanket skepticism about anything published in the highest-impact journals (Nature, Science, PNAS). At least in the fields in which I have expertise, Nature, Science, and PNAS mostly publish good work that doesn’t ring any alarm bells. That the top general science journals publish some flawed papers, or that they (may) publish a greater fraction of flawed papers than more specialized journals (as is sometimes claimed even though it’s not demonstrated by the evidence everyone likes to cite), does not justify dismissing everything they publish out of hand. (Aside: further details about the fakery incident that prompted Andrew Gelman’s comments here, here and here. The demonstration that it is indeed fake is here [I read it, it’s devastating]. A bit of sensible commentary here. This incident is all over the intertubes, so you won’t have to look far to find further commentary and speculation of varying levels of sensibleness.)

Alan Turing has a “new” preprint up on arXiv, concerning the application of probabilistic reasoning to cryptography. I say “new” because it was written during WW II but only recently declassified. HT Andrew Gelman, who has some good comments on Turing’s reasonableness and good judgment. I agree with Gelman that good judgment is a very underrated trait in science. A lot of what Meg, Brian, and I write about is our own hopefully-good judgments about all sorts of stuff.

Why AIC appeals to ecologist’s lowest instincts

It is my sense of the field that AIC (Akaike information criteria) has moved past bandwagon status into a fundamental and still increasingly used paradigm in how ecologists do statistics. For some quick and dirty evidence I looked at how often different core words were used at least once in an article in Ecology Letters in 2004 and 2014. Regression was used in 41% and 46% respectively. Significance was used in 40% and 35%. Richness was 41% and 33%. And competition was 46% and 49%. Perhaps a trend or two in there but all pretty steady. AIC has gone from being in 6% of the articles in 2004 to 19% of the articles in 2014. So in summary – AIC has tripled in usage and is now found in 20% of all articles and is used almost 2/3 as often as the mostly widely used statistical technique of significance..

I have a theory about why this has happened which does not reflect favorably on how AIC is used. Please note the qualification “how AIC is used”. AIC is a perfectly valid tool. And like so many tools, its original proponents made reasonable and accurate claims about it. But over time, the community takes ownership of a concept and uses it how they want, not how it was intended.

And I would suggest how people want to use AIC is in ways that appeal to two low instincts of ecologists (and all humans for that matter). First humans love rankings. Most newspapers contain the standings of all the teams in your favorite sport every day. We pay more attention of the rankings of a journal’s impact factor than its absolute value. Any number of newspapers produce rankings of universities. It is ridiculous to think that something as complex as journal quality or university quality can be reduced to one dimension (which is implicit in ranking – you can’t rank in two dimensions). But we force it on systems all the time. Second, humans like to have our cake and eat it too. Statistics have multiple modalities or goals. These include: estimation of parameters, testing of hypotheses, exploration of covariation, prediction into new conditions, selecting among choices (e.g. models) etc. Conventional wisdom is you need to be clearly based in one goal for an analysis. But we hate to commit.

You can probably already see where I’m headed. The primary essence of what AIC delivers is to boil choices down to a single dimension (precisely it provides one specific weighting of the two dimensions of likelihood and number of parameters to give a single dimension) and then ranks models. And comparing AIC scores is so squishy. It manages to look like all 5 statistical goals at once. It certainly does selection (that is its claim to fame). But if you’ve ever assessed whether ΔAIC>2 you have done something that is mathematically close to p>0.05.

Just to be clear, likelihood also can be used towards all those goals. But they present much more divergent paths. If you’re doing hypothesis testing you’re doing likelihood ratios. If you’re doing estimation you’re maximizing. If you’re doing selection you can’t proceed unless you specify what criteria to use in addition to likelihood. You have to actually slow down and choose what mode of inference you’re doing. And you have to make more choices. With AIC you present that classic table of ΔAIC and weights and voila! You’ve sort of implied doing all five statistical goals at once.

I want to return to my qualification of “how AIC is used”. The following is a simple example to illustrate how I perceive AIC being used these days. Take the example of species richness (hereafter S). Some people think that productivity is a good predictor (hereafter prod). Some people think seasonality is a better predictor (hereafter seas). Some people suggest energy is the true cause (hereafter energ). And most people recognize that you probably need to control for area sampled (area).Now you could do full blown variable selection where you try all 16 models of every possible combination of the four variables and using AIC to pick the best. That would be a pretty defensible example of exploratory statistics. You could also do a similarly goaled analysis of variable importance by scaling all four variables and throwing them into one model and comparing coefficients or doing some form of variance partitioning. These would also be true exploratory statistics. You could also use AIC to do variable importance ranking (compare AIC of S~prod, S~seas, S~energ). This is at least close to what Burnham and Anderson suggested in comparing models. You could even throw in S~area at which point you would basically be doing hypothesis testing vs a null although few would acknowledge this. But my sense is that what most people do is some flavor of what Crawley and Zuur advocate which is a fairly loose mix of model selection and variable seleciton. This might result in a table that looks like this*:

Model ΔAIC weight
S~prod+seas+area 0 31%
S~prod+energ+area 0.5 22%
S~prod+energ 1.1 15%
S~energ+seas 3.2 9%
S~energ 5.0 2%

There are a couple of key aspects of this approach. It seems to be blending model selection and variable selection (indeed it is not really clear that there are distinct models to select from here, but it is not a very clear headed variable selection approach either). Its a shame nobody ever competes genuinely distinct models with AIC as that was one of the original cliams to the benefit of AIC (e.g. Wright’s area energy hypothesis S~energ*area vs.the more individuals hypothesis a SEM with two equations:  S~numindiv and numindiv~prod). But I don’t encounter it too often. Also note that more complicated models came out ranked better (a near universal feature of AIC). And I doubt anybody could tell me how science has advanced from producing this table.

Which brings me to the nub of my complaint against AIC. AIC as practiced is appealing to base human instincts to rank and to be wishy washy about inferential frameworks.There is NO philosophy of science that says ranking models is important. Its barely better than useless to science. And there is no philosophy of science that says you don’t have to be clear what your goal is.

There is plenty of good debate to have about which inferential approach advances science the best (a lot has happened on this blog!). I am partial to Lakatos and his idea of risky predictions (e.g. here). Jeremy is partial to Mayo’s severe tests which often favors hypothesis testing done well (e.g. here). And I’ve argued before there are times in science when exploratory statistics are really important (here). Many ecologists are enamored with Platt’s strong inference (two posts on this) where you compare models and decisively select one. Burnham and Anderson cite Platt frequently as an advantage of AIC. But it is key to note that Platt argued for decisive tests where only one theory survives. And arguably still the most mainstream view in ecology is Popperian falsification and hypothesis testing. I can have a good conversation with proponents of any of these approaches (and indeed can argue for any of these approaches as advancing science). But nowhere in any of these approaches does it say keeping all theories around but ranking them is helpful. And nowhere does it say having a muddled view of your inferential approach is helpful. That’s because these two practices are not helpful. They’re incredibly detrimental to the advance of science! Yet I believe that AIC has been adopted precisely because they rank without going all the way to eliminating theories and because they let you have a muddled approach to inference.

What do you think? Has AIC been good for the advance of science (and ecology). Am I too cynical about why hordes are embracing AIC? Would the world be better off if only we went back to using AIC as intended (if so how was it intended)?

UPDATE – just wanted to say be sure to read the comments. I know a lot of readers usually skip them. But there has been an amazing discussion with over 100 comments down below. I’ve learned a lot. Be sure to read them.

*NB this table is made up. In particular I haven’t run the ΔAIC through the formula to get weights. And the weights don’t add to 100%. I just wanted to show the type of output produced.

What ecology labs do you remember from when you were a student?

When I visited the University of Maine recently, Brian and I started talking about ecology labs, and I immediately rattled off some of the ones I remembered from when I was an undergrad. I’ve never taught a lab course myself, but clearly the ones I took as an undergrad had a big impression on me. So it was funny timing that, shortly after I got back, we were contacted by Eva Dettweiler-Robinson, a PhD candidate at the University of New Mexico. She and Jenn Rudgers are working to redesign a lab course related to ecology, evolution, and the diversity of life.* They want to know: “What lab activities from your introductory undergraduate Ecology or Evolution courses were the most memorable?” The general idea is that, if you can remember a specific lab you did 5, 10, 15, 20+ years after taking the course, it was probably an engaging lab.

So, what specific ecology and evolution labs do you still remember?

Here are the top ones for me:

Cemetery Demography: This is lab I still remember the most and that was the first thing I brought up when talking to Brian. I think this is a standard one that is done lots of places. (I see that there’s a TIEE article on it.) I loved it and if I ever teach an ecology lab, I would use it.

Goldenrod galls: I don’t actually remember the specifics of what we did with this lab. But part of why I remember it is because we did it at the same field sites where Dick Root did his goldenrod research. Knowing that we were doing things that were related to actual science that was ongoing – and not something that was just a canned exercise – was really neat.

Field Ecology trips with Peter Marks: This isn’t a specific lab activity, but a highlight of my undergraduate coursework was taking Field Ecology from Peter Marks. Going outside with someone who is such an amazing natural historian was so much fun and so eye-opening. I remember specific field trips to an old growth forest plot near Ithaca, and a weekend trip to the Cary Institute. I took the course in my last semester at Cornell, since I was a bit slow to realize I wanted to be an ecologist. An advantage to this is that, since most of the other students were sophomores, they were still kind of intimidated by faculty. So, the spot in the front passenger seat of the van Peter drove was always open. I sat there on every trip, and still remember things like him pointing out Ailanthus growing on the side of the road, talking about the possibility of roadsides as habitats that facilitate invasions.

Foraging for beans: I don’t recall this very well, but I remember that we had a lab where we had to pick beans out of grass. But I don’t remember what the point was! There was a similar evolution lab activity, though, where we had to pick up beans with different “appendages” (a spoon, fork, chopsticks, etc.) There were timed foraging rounds, and then individuals could “reproduce” based on how many beans they’d collected.

Wolf reintroduction to Yellowstone: This was a discussion activity, rather than a lab, but given how well I remember it, I figured I’d include it anyway. The class was split into different stakeholder groups related to the question of whether wolves should be reintroduced to Yellowstone National Park. I don’t remember all the groups, but I know that some people were supposed to be government scientists and some ranchers. I was supposed to represent Defenders of Wildlife, which is a conservation non-profit. I found the exercise really interesting, and it definitely drove home the point that it can be really tricky to balance the needs of different interest groups when making conservation decisions.

What ecology and evolution labs do you remember from when you were a student? If you’d rather tweet your reply, use the hashtag #ecoevolabs.


* We generally don’t write posts on request, but this one was so up my alley that I was excited to run with it.

Related posts:

  1. Using Mythbusters to teach about experimental design
  2. Videos for teaching ecology

Advice for the summer conference season (UPDATED)

Summer conference season is here! Whether you’re going to the CSEE Meeting, the ESA Meeting, Evolution 2015, or somewhere else, we’ve got you covered with plenty of advice on how to prepare:

Traveling to meetings while breastfeeding

Why network at conferences?

How to network at conferences

Tips for giving a good talk or poster

How to ask tough questions

How to answer to tough questions

How not to start your next ecology or evolution talk

On wandering alone at conferences

UPDATE: Perfecting the elevator pitch

Friday links: statistics vs. TED talk, #scimom, Jeremy vs. Nate Silver, and more

Also this week: fake it ’til you make it (look like you work 80 hours/week), great Canadian minds think alike, evolutionary biologists vs. ecologists, E. O. Wilson vs. the OED, Wyoming vs. data, the evidence on anonymity and openness in peer review, subtle gender biases in award nomination, and much more. Lots of good stuff this week, you might want to get comfortable first. Or skim, if you must. But whatever you do, stick with it until the end so you can read about a runaway trolley speeding towards Immanuel Kant. :-)

From Brian (!):

A neat example on the importance of nomination criteria for gender equity is buried in this post about winning Jeopardy (an American television quiz show). For a long time only 1/3 of the winners were women. This might lead Larry Summers to conclude men are just better at recalling facts (or clicking the button to answer faster). But a natural experiment (scroll down to the middle of the post to The Challenger Pool Has Gotten Bigger) shows that nomination criteria were the real problem. In 2006 Jeopardy changed how they selected the contestants. Before 2006 you had to self-fund a trip to Los Angeles to participate in try-outs to get on the show. This required a certain chutzpah/cockiness to lay out several hundred dollars with no guarantee of even being selected. And 2/3 of the winners were male because more males were making the choice to take this risk. Then they switched to an online test. And suddenly more participants were female and suddenly half the winners were female. It seems so subtle and removed from the key point (who wins the quiz show) but airline flight vs online test seems to make a huge difference. What are accidental but poorly designed nomination criteria doing in academia? Several bloggers including Meg and Morgan have commented on how the nomination process can have a big impact on equitable gender outcomes in an academic context.

From Meg:

This article on how some men (and some, though fewer, women) fake 80 hour work weeks is interesting. To me, the most interesting part was the end:

But the fact that the consultants who quietly lightened their workload did just as well in their performance reviews as those who were truly working 80 or more hours a week suggests that in normal times, heavy workloads may be more about signaling devotion to a firm than really being more productive. The person working 80 hours isn’t necessarily serving clients any better than the person working 50.

The article is based on a study in the corporate world, but definitely applies to academia, too. (ht: Chris Klausmeier)

Apparently I wasn’t the only woman to have a post appear on Monday about how it’s possible to be a #scimom and the importance of role models! I really enjoyed this piece by anthropologist and historian Carole McGranahan. (My piece from Monday is here.)

From Jeremy:

Hilda Bastian (an academic editor at PLOS) takes a deep dive into all of the comparative and experimental evidence on anonymity and openness in peer review. It’s a blog post rather than a paper and so hasn’t been reviewed itself, so I’m trusting her to have done it right (FWIW, it has all the signs of trustworthiness). I love that she’s up front about the serious design and sample size problems of many studies. That’s one of the main take-homes, actually–on several issues, the available evidence sucks, so you can’t draw conclusions on those issues. And I love that she’s looking at all the available evidence, not just focusing on whichever study (or appalling anecdote) gets talked about most or supports her views (she favors openness over anonymity). Among her conclusions:

  • Reviewers often see through author blinding
  • Revealing reviewer identities causes many reviewers to decline to review, but may make reviews somewhat better
  • Author blinding can reduce, increase (yes, increase), or have no effect on gender bias. But the evidence is pretty unreliable and hard to interpret.

Stephen Heard on why scientific grant funding should be spread fairly evenly among investigators. Echoes an old post of mine (we even independently came up with equivalent graphical models!), though Stephen goes beyond my post in considering how uncertainty in predicting PIs’ future productivity should affect funding allocation.

Caroline Tucker comments on the opposing papers deriving from the ASN meeting’s debate on ecological vs. evolutionary limits on continental-scale species richness. Haven’t read them myself yet, but judging from her comments I’m wondering if the competing hypotheses are too vaguely defined to actually be testable. Whenever people disagree on whether evidence X even counts as a test of hypothesis Y, that makes my spidey sense vague hypothesis sense tingle.

The always-thoughtful Arjun Raj muses on when to retract a paper. Not as easy a call as you might think.

This is old but I missed it at the time: great This American Life episode on the fuzzy boundary between bold science and crackpottery, as exemplified by a collaboration between an NIH-funded cancer researcher and a musician. A meditation on the importance–and frustration–of looking for evidence against your ideas (“severe tests“) rather than evidence for them. Here are my related old posts on pseudoscience and scientific lost causes. (ht Andrew Gelman)

His own recent claim to the contrary, no, E. O. Wilson did not coin the term “evolutionary biology”, though it’s possible that he helped to popularize it.

Dismantling the evidence behind the most-viewed TED talk ever. The first bit (before the p-curve stuff) would be a good example for an introductory stats course.

Speaking of good examples for an intro stats course, here’s Nate Silver committing the most common and serious statistical mistake made by people who should know better: letting the data tell you what hypothesis to test, and then testing it on the same data. This mistake goes by various names (circular reasoning, “double-dipping”, “Texas sharpshooter fallacy“). Here, Silver notices an unusual feature of some ice hockey data, and then calculates a very low probability that the feature would occur by chance. Which is very wrong (and no, the fact that P is way less than 0.05 here does not make it ok). Every dataset has some “unusual” features, just by chance. You can’t notice whichever feature that happens to be, and then test whether that particular feature would be expected occur by chance alone. Because if the dataset had happened to exhibit some other “unusual” feature, you’d have done the test on that feature instead (Andrew Gelman calls this “the garden of forking paths“). It’s the equivalent of hitting a golf ball down a fairway, and then declaring that it’s a miracle that the ball landed where it did, because the odds are astronomical that the ball would land on that particular spot by chance alone (can’t recall where I read that analogy…). Nate Silver’s on record saying that frequentist statistics led science astray for a century. But ignoring its basic principles (here, predesignation of hypotheses) isn’t such a hot idea either. Come on, Nate, you’re better than this.

In praise of linear models. From economics, but non-technical and applicable to ecology.

Wyoming just criminalized gathering environmental data if you plan to share the data with the state or federal government. IANAL, but I can’t imagine this passing constitutional muster. But in a weird way, I’m kind of impressed with Wyoming here. Go big or go home, as the saying goes–even when it comes to data suppression. (ht Economist’s View)

This is from last month but I missed it at the time: Paige Brown Jarreau summarizes her doctoral dissertation on why science bloggers blog and what they blog about. Looks like Meg, Brian, and I are typical in some ways, but atypical in other ways.

And finally: little known variants of the trolley problem:

There’s an out of control trolley speeding towards Immanuel Kant. You have the ability to pull a lever and change the trolley’s path so it hits Jeremy Bentham instead…

(ht Marginal Revolution)

In a variable world, are averages just epiphenomena? (UPDATED)

The value of any variable we measure likely is affected by, or correlated with, lots of others. And the effect of any variable on another likely depends on the values of other variables.

I tell my intro biostats students that this is why we care about averages. In a noisy world about which we never have perfect information, it’s mostly unhelpful to think in terms of certainties, because there aren’t any. But we can still think about what the world is like on average. That’s tremendously useful. For instance, a medication might not completely cure a disease in every patient–but it’s still tremendously useful to ask whether it improves patient health on average. On this view, variability around the average is of secondary interest to the average itself.

But there’s an alternative view. In a variable world, averages mostly are just meaningless, unimportant epiphenomena. Andrew Gelman articulates this well in a recent post:

Some trends go up, some go down. Is the average trend positive or negative? Who cares? The average trend is a mixture of + and – trends, and whether the avg is + or – for any given year depends…The key is to escape from the trap of trying to estimate a single parameter

On this view, variability is primary and averaging across some or all sources of variability is of at best secondary interest, or even harmful.

This rather abstract-sounding philosophical debate comes up often in everyday practice in ecology and evolution. Most obviously, it comes up in debates over how to interpret main effects in ANOVA-type models in which there are significant interaction terms. But the same issue comes up outside of purely statistical contexts.

For instance, think of the debate over the interpretation of interspecific allometric scaling exponents. When you plot, say, metabolic rate vs. body size for a bunch of species, a power law relationship with an exponent of 0.75 explains a lot of the variation. Individual species and clades deviate from this average relationship, of course, but that’s the average. One school of thought sees this as a hugely significant biological fact (e.g., Brown et al. 2004). We can develop alternative models to try to explain this average exponent. And we can use the average metabolic rate-body size allometry as a baseline and try to explain why particular species or clades deviate from it. An opposing school of thought notes that different clades deviate from the average allometry in different ways and concludes that the average allometry is a meaningless epiphenomenon (e.g., Reich et al. 2006). There is no “universal” metabolic rate-body size allometry. Rather, the clade-specific allometries are real, and different from one another. It’s those clade-specific allometries we should seek to explain and predict. Presumably with clade-specific explanations that don’t involve deviations from some purportedly-universal baseline.

As another example, a lot of debate about the “units” or “levels” of selection in evolution comes down to the interpretation of the average fitnesses of different entities (see Charles Goodnight’s entire blog). As a third example, one of the arguments for doing macroecology is that a lot of uninterpretable, idiosyncratic variability washes out at large scales, revealing average behavior that’s worth studying.* On the other hand, such averages arguably are highly uninformative about the underlying biological processes, and so arguably aren’t very helpful, at least not if your goal is to learn something about biology. Outside of ecology and evolution, think of the debate in psychology over whether g (“general intelligence”) is a real and important human trait, or a meaningless statistical artifact. And I’m sure you can think of other examples. Indeed, there are whole fields that have a methodological commitment to focusing on variability, while others have the opposite commitment (think evolutionary biology vs. developmental biology).

For each of the examples mentioned above, there’s a big literature on the pros and cons of taking some average as primary, vs. taking that same average as an unimportant epiphenomenon. But is there anything that can be said in general about when the former approach or the latter approach makes more sense as a research strategy?**

For instance, in ecology and evolution a lot of valuable theoretical and empirical research on allometric relationships has come out of the school of thought that sees average allometries as important, universal biological phenomena. Even if much of that work turns out to be incorrect, I suspect that the hypothesis of a universal, meaningful allometric exponent was the more fruitful working hypothesis. That is, I suspect we wouldn’t have learned as much about organismal form and function if we’d instead gone with the working hypothesis that variation is primary. But on the other hand, I’m sure you, like Andrew Gelman, can think of situations in which focusing on estimating and explaining the “true” average value of some quantity was never a promising research strategy. And I’m sure you can think of cases in which one can make significant progress either by taking the average as primary, or by taking variation around the average as primary.

Anyone know if some historian or philosopher of science has done a comparative study of debates about the interpretation of averages, trying to identify the circumstances in which “average-focused” vs. “variability-focused” research strategies are most fruitful?** If so, I’d love to read it. If not, somebody should do it.

UPDATE: You should totally check out the comment thread, it’s very good. Especially this comment from Simon Hart, reminding us that in a nonlinear, nonadditive world, it’s often essential to focus on averages (and understand how said averages are affected by the nonlinearities and nonadditivities you’re averaging across). I have a series of old posts on this in the context of modern coexistence theory; starts here.

*We have lots of old posts on this.

**Beyond the obvious point that focusing on the average probably (not always!) makes sense if there’s hardly any variation around the average.

Old school literature searches and the fun of reading classic, non-English literature

In my post last week, I pointed out that I haven’t read nearly as much in the past semester as I’d hoped to read. But I did read some things! In fact, as far as I can tell, I think that, during the course of the semester, I read every paper that has been published (and one that hasn’t been) on parasites that attack developing embryos of Daphnia. This has been a lot of fun. First of all: how often can you say that you think you’ve read everything that’s been written on a topic you are studying?* Second, it’s felt like a classic, old school literature hunt, and that’s been a lot of fun.

Since I was a grad student, I’ve seen Daphnia infected with a parasite that attacks the developing embryos. As a grad student, I initially would record it as “scrambled eggs” in my lab notebook, since I tried to use names that were evocative. (This also led to parasites named “scarlet” and “Spiderman”.) Over the years, I started simply referring to it as “the brood parasite”. It was something I was interested in doing more on, but I didn’t have the time and knew I would need to collaborate with a mycologist to do the work well.

Fast forward approximately 10 years to when I arrived at Michigan. Here, I’m fortunate to have a fantastic mycologist colleague, Tim James, who was game for helping me figure out what the parasite is. We recruited a first year undergraduate, Alan Longworth, to help us work on the project. In the end, the parasite has proved to be really interesting. We have our first manuscript on it in review right now.

One of the key things we wanted to do with the initial brood parasite project was figure out what the parasite was. Microscopy and molecular analyses indicated it was an oomycete, but not particularly closely related to anything that had been sequenced previously. We started thinking about what we might name it if we decided it was a novel species (twitter had some great suggestions focusing on mythological characters that killed babies!), but I also wanted to really dig into the literature.

The first two, most obvious sources to consult were Dieter Ebert’s excellent book on parasites of Daphnia, and a classic monograph by Green on the same topic. Dieter’s book has relatively little coverage of brood parasites, though does point out that they are common and highly virulent. The Green monograph mentioned a “fungal”** parasite, Blastulidium paedophthorum. To cut to the chase: all the evidence points to our brood parasite being Blastulidium paedophthorum. That’s a lot to keep typing (or saying!), and it’s too good to pass up on the opportunity to use “Bp” as the abbreviation, as that works for both the scientific name (Blastulidium paedophthorum) and the common name we’d been calling it (brood parasite). So, we’ve declared the parasite Bp.

Backing up again, the description of Bp in Green seemed like a good fit to what we were seeing, so I wanted to read everything I could about the parasite.*** This started me down a path of reading some really old papers, nearly all of which were in foreign languages. Bp was first described by Pérez in 1903, with a follow up paper in 1905. I was kind of blown away that I could easily download these from my dining room! Chatton had a paper on Bp in 1908 (also available from my dining room table!) After that, it was featured by Jírovec in his wonderfully titled 1955 paper. (The title translates to “Parasites of Our Cladocera”. I love the possessive “our”! :) ). And then, crucially, it was the focus of ultrastructure work by Manier, reported in a paper in 1976.

All of the papers in the preceeding paragraph were important to figuring out whether we were working with the same parasite. None of them are in English. That added to the fun “I’m going on an old school literature hunt” feel, but also made it more challenging to read them.**** Reading them involved a combination of trying to remember my high school French, lots of time with Google translate, and, ultimately, seeking out translators. It was relatively easy to find translators for the French papers, thanks to a few people being really generous with their time. The Czech one, by Jírovec, took substantially longer to find a translator for, but a Czech Daphnia colleague, Adam Petrusek, was kind enough to put me in touch with someone who did a great job on the translation.

All semester, I’ve been thinking about how much fun this has been. Indeed, it’s part of why I really want to figure out how to set aside time to read more! But it especially came to mind after reading this recent ESA Bulletin piece by David Inouye on the value of older non-English literature. In that, Inouye talks about his own journeys through the older non-English literature, and concludes with this paragraph:

So my paper trail extends back to some of these early natural historians in Austria and Germany. Their work helped give me a much longer historical perspective than I would have had if I’d relied just on the English literature on ant–plant mutualisms, primarily from the 1960s on. Although as a graduate student I was able to track down the original publications from the 1880s in libraries, I see that some of this literature is now freely available on Web resources such as ReadAnyBook.com, the Biodiversity Heritage Library, or old scientific literature scanned by Google Books. And the translation from Google Translate I just tried with some of von Wettstein’s 1888 papers is certainly sufficient to follow most of the content. So perhaps the only barrier to familiarity with older non-English literature for ecologists now is the time required to find it. Time that might be well spent to broaden your perspective and make sure you’re not re-discovering insights from early natural historians.

I completely agree that the longer historical perspective – especially that provided by the non-English literature – has been essential. If not for those papers, we would think that this parasite hadn’t been described before and was in need of a name. And I clearly agree with the second-to-last sentence, which is very much in line with my post from last week (which I wrote before reading Inouye’s piece). So, here’s hoping we all find the time to really dig into the literature, and that, while doing so, we remember that there’s lots of value in digging into the classic, non-English literature.


* Okay, fine, it’s not like there are tons of papers on the topic. But it’s still fun to think I’ve read all of them.

** The parasite is an oomycete, and oomycetes are not fungi. But that wasn’t recognized in the early 1970s when Green published his monograph.

*** The references for this paragraph are: Pérez 1903, 1905, Chatton 1908, Jírovec 1955, Manier 1976; full references are given below.

**** I would absolutely love to be multilingual. Sadly, I am not.



Chatton, E. 1908. Sur la reproduction et les affinités du Blastulidium paedophtorum Ch. Pérez. Comptes Rendus Des Seances De La Societe De Biologie Et De Ses Filiales 64:34-36.

Jírovec, O. 1955. Cizopasníci našich perlooček II. Československá Parasitologie II 2:95-98.

Manier, J.-F. 1976. Cycle et ultrastructure de Blastulidium poedophthorum Pérez 1903 (Phycomycète Lagénidiale) parasite des oeufs de Simocephalus vetulus (Mull.) Schoedler (Crustacé, Cladocère). Protistologica 12:225-238.

Pérez, C. 1903. Sur un organisme nouveau, Blastulidium paedophthorum, parasite des embryons de Daphnies. Comptes Rendus Des Seances De La Societe De Biologie Et De Ses Filiales 55:715-716.

Pérez, C. 1905. Nouvelles observations sur le Blastulidium paedophthorum. Comptes Rendus Des Seances De La Societe De Biologie Et De Ses Filiales 58:1027-1029.

Being “out” as a #scimom

Something that is very important to me is to be open about being a scientist – a woman scientist, in particular – who has children. The data don’t paint a rosy picture for scientist mothers, and this is in part because of the biases we all have related to women in science (and especially regarding women in science with children). My hope is that, by being open about being a scientist mother, I can do my small part to normalize the idea of women scientists having children.

How do I try to achieve this? I mention my children in class sometimes (even though I’m sure this makes me seem less serious to some students.) I have artwork by my kids in my office. I tweet about my children and the germs they all-too-frequently bring home from daycare. If a student group asks me to give a talk in the evening, I tell them that I can’t make it because I will be home with my kids. I mention my kids sometimes on this blog.

Another thing that I did in the past is have my children* in my Twitter avatar. At some point, I changed to the little red Daphnia**. I think the little red Daphnia is fun and distinctive. But, thinking about all this more recently, I’ve decided to go back to the avatar of me with my son. In some ways it feels silly, since it’s such a small gesture. But then I am reminded, that sometimes those small gestures matter. I was recently told that seeing the juxtaposition of me tweeting that I got tenure with my avatar (at that time, a picture of me holding my daughter) really resonated with some younger women scientists and gave them hope that it is possible to be a women in science and have children.

So, Happy Mother’s Day, a day late***, to all the #scimoms out there! To celebrate, I’m going back to an avatar showing me with one of my kids:



* Usually only one at a time, because it’s nearly impossible to get me and both of them all looking at the camera at the same time!

** No, it’s NOT a bird!

*** At least, for the US Mother’s Day. I know it’s not the same day in all countries.