Also this week: underwater thesis defense (yes, really), database-defeating data (yes, really), why scientific papers should be longer (yes, arguably), how penguins ruined nature documentaries, and more. Including this week’s musical guest, They Might Be Giants!
From Meg:
There are just three wolves left on Isle Royale*, meaning that the predator part of the longest running predator-prey study is likely to end soon.
(* If you want to pronounce this like a native, you should pronounce it the way you’d say Isle Royal. Ah, Michigan pronunciations.)
MacLean’s had a piece on why there are still far too few women in STEM, which featured work by Alex Bond. One of the points the piece makes is that women are “consistently passed over for recognition”. Their focus is on women in Canada, but this applies in the US, too. Related to that, I’m glad that ProfLikeSubstance is also calling attention to the poor gender ratio of NSF Waterman Awardees.
I’m really glad to hear that the terHorst Lab at Cal State-Northridge organized an event to create Wikipedia pages for women in ecology and evolution! This old post of mine has a list in the comments of women who people have proposed as needing Wikipedia pages (or improvements to existing pages).
Seminars for most of the speakers from the UMich EEB Early Career Scientist Symposium (which focused on the microbiome) are mostly available on youtube! Talks by Seth Bordenstein, Katherine Amato, Kevin Kohl, Kelly Weinersmith, Rachel Vannette, Justine Garcia, and Georgiana May are available.
PhD comics on how to write an email to your instructor or TA. (ht: Holly Kindsvater)
From Jeremy:
A lot of people think that grant review is a crapshoot, because review panel ratings of funded grants often don’t correlate strongly with the subsequent impact of the work funded by those grants. But that’s a silly criticism, because the whole point of grant review panels is to make (relatively) coarse distinctions so as to decide what to fund, not to make (relatively) fine distinctions among funded proposals. A natural experiment at NIH provides an opportunity to test how good grant review panels are at deciding what to fund. Back in 2009 stimulus funding led NIH to fund a bunch of proposals that wouldn’t otherwise have been funded. Compared to regular funded proposals, those stimulus-funded proposals led to fewer publications on average, and fewer high-impact publications, and the gap is larger if you look at impact per dollar. The mean differences aren’t small, at least not to my eyes, though your mileage may vary, and of course there’s substantial variation around the means. Regular proposals also had higher variance in impact than stimulus-funded proposals, which means NIH can’t be said to be risk averse in its choice of proposals to fund. And if you think that NIH is biased towards experienced investigators, think again–stimulus-funded proposals were more likely to be led by experienced PIs than were regular funded proposals. I’d be very curious to see an analogous study for NSF. (ht Retraction Watch)
p.s. to previous: And just now–late Thursday night–I see that different authors have just published a Science paper looking at a different NIH dataset and reached broadly the same conclusion even though they restricted attention to funded grants. No doubt one could debate the analysis and its interpretation, probably by focusing on the substantial variation in impact that isn’t explained by review panel scores. But together, these two studies look to me like a strike against the view that grant review is such a crapshoot, and/or so biased towards big names, as to be useless. Related old post here.
Speaking of peer review, here’s a brief and interesting history of peer review at the world’s oldest scientific journal.
How March of the Penguins ruined US nature documentaries.
How long does a scientific paper need to be? Includes some thoughtful pushback against the view, expressed in the comments here, that short papers are more readable. Also hits on something we don’t talk about enough: how online supplements are changing how we write papers. I disagree with the author that online supplements are always a good thing on balance.
One oft-repeated criticism of conventional frequentist statistical tests is that their design encourages mindless, rote use. So I was interested to read about mindless, rote use of a Bayesian approach in psychology. An illustration of how the undoubted abuses of frequentist statistics are not caused by frequentist statistics, but rather are symptoms of other issues that wouldn’t be fixed by switching to other statistical approaches. Here, the issue is the need for agreed conventions in how we construct and interpret statistical hypothesis tests, and associated default settings in statistical software.
An MSc student at the University of Victoria will defend his thesis underwater. No, he’s not a marine ecologist. I wonder what happens if someone on his committee asks him to go to the board. 🙂 (ht Marginal Revolution)
This makes me want to change my last name to NA, just to troll programmers. 🙂 (ht Brad DeLong)
And finally, the fact that I’m excited about this dates me in multiple ways: They Might Be Giants have a new album out! Here’s a sample, which I justify linking to on the grounds that the video includes a couple of jokes our readers will particularly appreciate:
TMBG has a song that mentions rotifers! 🙂 http://tmbw.net/wiki/Lyrics:My_Brother_The_Ape
Cool! Here’s the video:
The video even has a picture of a rotifer at the appropriate point. Philodina, I think.
Of course, TMBG fans will recognize My Brother The Ape as kind of a phylogenetically-expanded version of Mammal 🙂
“how online supplements are changing how we write papers.”
I tended to write quite long online supplements. They were never read by Reviewers (no problem on that, time is limited for everyone), although they often described/reported critical pieces of information that couldn’t be left in the main text due to space restrictions (I also tend to write long papers).
I have the impression that both Supplements and code that I always provide were rarely read/downloaded.
I am not really sure I will continue to write long(ish) Supplements.
You’ve put your finger on a key point, I think. Insofar as supplements are reporting crucial information for evaluating and interpreting the main text, they need to be read by reviewers (and readers!). But as you say, people don’t tend to read supplements, naturally assuming that all the important material is in the main text. So rather than improving rigor or openness or whatever, supplements can actually inhibit it, by “hiding in plain sight” crucial information that actually belongs in the main text.
When we read papers in journal clubs, I’m usually the only one who’s downloaded, printed, and tried to read the supplemental info. If journals automatically included the supplemental info in downloaded PDFs I bet they’d be much better read.
“If journals automatically included the supplemental info in downloaded PDFs I bet they’d be much better read.”
Could be. Though supplements these days seem to be getting longer and more numerous. 50-100 pp. isn’t unheard of anymore. There’s no way I’m reading all that except in unusual circumstances.
Jeremy, NSF has done a study (http://www.esajournals.org/doi/abs/10.1890/13.WB.017?journalCode=fron) on the predictive power of panels.
PLS did a follow-up blog post: (http://proflikesubstance.scientopia.org/2013/10/09/proposal-peer-review-helpful-criticism-with-no-predictive-power/)
Summary: panels are not good at predicting productive projects, transformative projects, or high impact projects.
Cheers for this. A couple of quick comments:
-The Scheiner & Bouchie study is for funded NSF grants only. You wouldn’t expect panel ratings to have much if any predictive power for subsequent impact, because there’s not going to be much variation in panel ratings among the funded proposals. The surprise with the NIH study just published in Science is that panels have any predictive power at all when looking only at funded grants.
-The Scheiner & Bouchie study has a tiny sample size: 41 funded projects in 2001 and 2002. Very low power. The NIH study looked at 130,000 grants over decades. Now, I guess you could argue that if an effect is so tiny that you need a big sample size to detect it, it might as well not be there. But again–the surprise is that panel ratings would have any predictive power at all if you’re only looking at funded grants.
I’d like to see someone take the “stimulus funding as natural experiment” approach with NSF data. I think that would be more informative about how much value NSF review panels add.
There was another study just before the Scheiner & Bouchie study that was one specific panel at NIH (cardiology if I remember) that got similar results.
Overall I think the effect sizes in this latest study are not huge. 1 standard deviation in score (fairly large score differences) has a 7% change in # of pubs and a bit more in # of citations (15% more in citations but that is confounded I think with the 7% change in publications).
Also the TARP study has lots of confounding factors – in particular very short notification with requirements to spend quickly that were guaranteed to reduce quality of results independent of the quality of proposals.
Also, it is worth noting that I think NIH panels are much more narrowly defined and spend more time. I think it quite likely NSF panels would be noticeably less accurate.
This to me is not results to shout from the roof tops about how great a job we’re doing – especially when these results are being used to pick the 5-10% that get funded.
All fair points Brian.
Like you, I favor giving smaller grants to more people, precisely because it’s so difficult to predict which projects will work out best. Sorry if that wasn’t clear from the post.
I also think grant review panels are good at their jobs, meaning they’re as good as they could be expected to be. My comments in the post are mainly aimed at those folks who take the extreme view that review panels are *so* unable to distinguish among proposals, and/or are so biased in some way (e.g., favoring bigwigs), that they’re completely useless, except maybe for weeding out the small fraction of terrible proposals. Probably should’ve been clearer on this as well. And if you wanted to argue that people who take such extreme views are sufficiently rare and non-influential as to not be worth paying attention to, well, you might have a point there.
The other reason I wanted to give these studies a shout-out is just my general preference for data over data-free guessing. Yes, absolutely, these data are consistent with various hypotheses about the underlying situation (though as I said above, I do think they rule out some extreme views about the underlying situation). But they are the data we have, so I do think we need to make what use of these data we can. As you know, there are a lot of myths out there about how granting bodies (and hiring committees, and journal reviews) actually work. To counter those myths, it helps to have data, even if it’s subject to potential biases or is otherwise challenging to interpret. If nothing else, I think it leads to more productive conversations than would occur in the absence of data.
Re: Isle Royale, I did my undergrad honors thesis work out there in the summer of 1994. Back then the wolf population also was near extinction (just one breeding female left). If memory serves (too lazy to look up the data), I believe the wolves subsequently rebounded briefly before dropping to their current size. So without wanting to judge whether the wolves “should” have been rescued, I suspect that a rescue wouldn’t be a one-time thing. Not if the long-term goal was to give the wolves a high probability of persisting for some lengthy period. The wolf population on the island has never been that big, it’s been very small for a while, and arguably they’re lucky not to have gone extinct back in the mid-90s. Small populations are at high risk of extinction, and this wolf population is always going to be small.
One of my strongest memories of that summer is that there were a *lot* of moose on Isle Royale. Walking anywhere on the island, or paddling around it on the lake, you were guaranteed to see moose. This was right before a big disease outbreak killed off a bunch of moose, if memory serves.
NO WAY! That’s so neat that you did your honors thesis* work there. I did mine at the most polluted lake in the US. Harumph. 😉
*I didn’t formally submit an honors thesis because I graduated in January and it somehow wasn’t possible, for reasons that didn’t make sense.
“NO WAY! That’s so neat that you did your honors thesis* work there. I did mine at the most polluted lake in the US. Harumph. ;)”
If it’s any comfort, a once-in-a-decade summer storm demolished my experiment halfway through the field season (told this story in more detail here: https://dynamicecology.wordpress.com/2011/06/07/my-first-publication-revisiting-an-oikos-non-classic/). And you have since more than made up the difference in terms of working at attractive field sites. 🙂
That’s weird about Cornell not having a way for students who finish in Jan. to submit honors theses.
There are some places where it’s actually preferable to finish in January. My brother in law made sure to finish in January at Middlebury because graduating students there ski down the college ski slope in their gowns for the January graduation ceremony. 🙂
It’s possible there was a way around it, but I didn’t particularly care. My decision to graduate in January was fairly last minute, so I didn’t have a lot of time to look into it. (I graduated early so I could go work in Antarctica — meaning that I trumped you in coolness of field sites within a month of graduating 😉 )
@Jeremy – I think we largely agree. Especially about needing data to have an intelligent discussion about granting. So I’m glad you highlighted these papers (I highlighted the other two in an earlier Friday links).
At this point in the last year we have two published, data-driven, studies saying panels suck at assessing grant outcomes, and now two saying they do OK. I think its probably important to start picking through the details. And to pare away the spin to see if they disagree as much as they seem to.
@Brian:
So here’s a question: how well *should* we expect panels to be able to predict subsequent impact, in the best of all possible worlds? After all, ecologists all disagree with one another a lot on what sort of work is best. So *some* work or other is going to end up getting funded while other work doesn’t, and some work is going to bet published and cited more than other work. But which work gets funded, published, and cited will due at least in part to factors that can only be called stochastic, and that nobody could possibly predict. In other words, maybe if we wanted more predictability of impact, we’d first have to find some way to force all ecologists to think more alike!
It seems like this is the sort of question you want to ask if you’re contemplating reforms like subdividing panels into more specialized sub-panels, or giving panels more time, or etc., to try to improve the ability of panels to separate the wheat from the chaff. What if no improvement is possible? I suppose you could try to use comparative data across granting bodies using different evaluation methods to try to get at this a bit, as you hinted in an earlier comment.
Lurking under the surface here is a still-not-entirely-coherent suspicion of mine. Namely, that it doesn’t even make sense to think of grant applications as having some “true” underlying merit that panels try to estimate, and that is subsequently revealed (or at least better estimated) by publications, citations, etc. Rather, ecologists disagree with one another in their judgements about what sort of work is worth funding (or publishing, or citing). Those judgements will happen to be correlated in some fashion, which means that as a matter of mathematical necessity they’ll have some first principal component or other. But reifying that first PC axis as the “latent variable” of “true merit” is a mistake.
Ok, that last bit only barely makes sense to me and I’m the one who typed it, so clearly I need to do more thinking. 🙂
I think you ask an important question – how much should we expect to be possible from grant panels.
I think the answer is probably fairly limited.
But to me that answer, if true, has large implications for how grants should work. In particularly, right now in the US panels set them selves up as “playing God” – putting a razor-thin line between $1,000,000 grants and $0. If we don’t have that precision in assessment, we shouldn’t be giving grants out that way.
Also, and I think you’ve agreed with this in the past, if our prediction of project outcomes is weak, we ought to be moving more to granting based on PI track record* and less on project proposal quality. While certainly imperfect, PI track record is easier to assess than project proposal quality. And it would seem more predictive of future outcomes (although I would love to see more data on this). And as we know, Canada definitely leans more in this direction so it is certainly feasible.
In short, I do think our ability to predict probably is and always will be fairly limited. My critiques are definitively NOT critiques of the efforts of panels. But I think honesty about these limits has pretty big implications for optimal design of how grants are given out.
*And yes – you absolutely have to set up a system to let PIs get funding through their first 5 years or so until track record is established but this is doable.
Couldn’t agree with Brian more. If the panel system is bad, or even mediocre at predicting high impact or “transformative” science (a state goal of NSF), and the system is a version of an “all-or-nothing” funding mechanism, year-over-year, then it is worth thinking about whether the system actually serves the function it is intended to serve (funding quality basic science, broadly). The data we have so far suggests that serious discussion about panels and funding allocations is warranted.
“This makes me want to change my last name to NA, just to troll programmers.”
Most programming languages don’t use ‘NA’. You’re just looking to troll R programmers, perhaps?
My plan is to start with R programmers and work my way up from there. But first I’d have to figure out what my last name would have to be to troll, say, Python or C++ programmers. 🙂
As a starter, for databases a possibility… 😉 https://xkcd.com/327/
Albin
@Albin:
As usual, xkcd got there first. 🙂