More faculty hiring practices from economics that ecologists might (not) want to consider

Following up on my recent post noting that in some social science fields, including economics, faculty hiring places heavy (though far from exclusive) weight on one “job market” paper, here are some other aspects of how faculty hiring works in economics. Tweets from @LauraEllenDee were part of my inspiration, and comments on that previous post were a big help too (have I mentioned lately how much I love our commenters?)

I find it interesting to think about which if any of these formal and informal practices could or should be adopted in ecology and other scientific fields (even though I think current practices in ecology are mostly reasonable). Learning about how things work in other fields stops you from taking things for granted* and helps you imagine how things could work in your own field. It also gives you a more realistic sense of what any reforms in your own field might achieve. Learning about how things work in other fields both helps you dream and keeps you grounded.

One challenge in thinking about this is that to some extent these alternative clusters of practices may be “package deals”. You can’t always pick and choose, at least not very easily, because any one practice might well be undesirable or unworkable in isolation from other practices.

So here are some other hiring practices in economics (follow that link for the post from which I’ve gotten much of my information. See also.) This is obviously a broad-brush picture and I’m sure I haven’t gotten all the details right; comments welcome. If all you know about is hiring practices in ecology, get ready to enter the Twilight Zone. A world like ours in many respects, but weirdly different in others… 🙂

Continue reading

Why do some bandwagons in ecology get rolling much faster than others?

In comments recently, Jeff Ollerton asked an interesting question: why does scientific interest in certain topics take off suddenly, while interest in others builds over an extended period? You might think it would have something to do with a key triggering event, such as publication of a pathbreaking paper or book, prompting a step change in interest in the topic. Sometimes that happens. But in other cases it’s not clear why some bandwagons get rolling quickly while others take time to build momentum.

(Aside: for purposes of this post, “bandwagon” just means “topic lots of people work on”. No negative connotations.)

Continue reading

Should you start a science blog? Ask yourself these questions.

Recently in the comments, we were wondering about why ecologists who would be good bloggers (meaning both that they’d enjoy it, and they’d consider it a worthwhile use of their time if they were to do it) don’t blog. One reason might be uncertainty about what it takes, and what you can expect to get out of it. So if you’re thinking about starting a blog (and if you’re not, maybe you should be!), I suggest asking yourself the following questions (warning, long-ish post ahead):

Continue reading

Hoisted from the comments: Mathew Leibold and I on variance partitioning in metacommunity ecology

As part of my #ESA100 reflections, I commented that variance partitioning was dead as a way to infer the drivers of metacommunity structure. Mathew Leibold didn’t get what I was on about at all. Understandably–my remarks were brief and in retrospect not as clear as they should’ve been.* He was kind enough to take the time to comment at length on what he sees as the main problems with variance partitioning and how it’s currently applied, which gave me the chance to clarify my own views. I agree with Mathew on many points.

Wanted to highlight this exchange because I think it addresses an important issue in ecology. Understanding metacommunities is a really important job for community ecology, and right now variance partitioning is probably the single most popular tool for the job. It’s vitally important that we use that tool in effective ways, improve it if we can, and have a good understanding of what it can’t and can’t do. Mathew and I agree that:

  • There’s been excessive enthusiasm for using variance partitioning as a diagnostic test for metacommunity theories. There are too many possible “kinds” of metacommunities–far more than the small number of “paradigm” special cases on which existing theory focuses–for variance partitioning to be used as a diagnostic tool.
  • Insufficient attention has been given to how to interpret variance partitioning even if various statistical issues with it are addressed. (I’d add–don’t know if Mathew would agree–that this is a common problem in ecology. When a new statistical tool is developed, subsequent work tends to focus on identifying and resolving technical statistical issues with that tool. Which is fine, but tends to have the unfortunate side effect that equally or even more important non-statistical issues of how to interpret the tool tend to get neglected. Or worse, get mistaken for technical statistical issues.)
  • Variance partitioning remains a potentially useful statistical tool, and future work needs to focus on how to interpret and use that tool most effectively. (Not as a standalone diagnostic tool, but as one line of evidence among others, I’d say.)

I also think this exchange of comments was a nice example of how blogging can contribute to scientific discussion. So I wanted to highlight it for that reason as well.

*Mathew’s a friend and knows how my brain works. So if he can’t tell what the hell I’m on about, probably lots of other people couldn’t tell either. Which is my bad.

Evolution journals publish more (and different) theory papers than ecology journals

A little while back, commenter Holly K shared an interesting view on theory papers in evolution vs. ecology journals:

I am someone very interested in ecological theory in the pure data-free sense. But I think often strategic (general) theory is in the realm of evolutionary ecology, or even just evolution (behavior or speciation theory, for example). Truthfully, I read Am Nat, Evolution, JEB and Proc B much more faithfully than Ecology because I have noticed they tend to publish more insightful theory (including my own, admittedly I am biased!).

Which led to a little discussion in which a couple of people suggested that there’s greater appreciation for theory in evolutionary biology than in ecology. I’m mostly a dabbler in evolutionary biology, but I share the impression that theory in evolution isn’t seen as some separate subfield the way it often is in ecology. I share that impression in part because it seems like evolution journals publish a mix of theory papers and data papers, whereas ecology journals publish one or the other.

But rather than rely on possibly-erroneous impressions, I decided to compile a bit of data. I went through the Jan. 2015-May 2015 issues of two top evolution journals (Evolution and J. Evol. Biol.), five top ecology journals (Ecol. Lett., Ecology, JAE, J. Ecol., and Oikos), and a top journal publishing in both fields (Am Nat).* I counted up the number of theory papers. “Theory” was operationally defined as a paper based largely or entirely on one or more mathematical models. I didn’t count statistical models, things like Leslie matrices or or integral projection models, or papers that manipulated empirical data to simulate some hypothetical scenario (e.g., randomly deleting species from an observed food web in order to study extinction cascades). I kept a separate count of papers which developed a theoretical model, but also tried to parameterize or test it with quantitative data from some specific system or other. Fortunately, there were only two or three papers in the whole set for which I was in any doubt about how to classify them.

Here are the results, along with some brief comments. I didn’t bother expressing the results as a proportion of all papers, because the trends are totally obvious and wouldn’t be qualitatively altered by correcting for number of papers published by each journal.

  • Evolution: 13 theory papers. Many but not all were population genetics models.
  • J. Evol. Biol.: 10 theory papers, including 2 which used data to parameterize or test the model. One of those two was kind of borderline as to whether it was a theory paper.
  • Am Nat: 14 theory papers in ecology, 11 in evolution (counting anything with evolution in it as evolution, which didn’t lead to any weird categorizations to my eye). Six of those 25 used data to parameterize or test the model.
  • Ecol. Lett.: 5 theory papers, 2 of which involved data. One of those two was actually evolutionary, despite appearing in an ecology journal.
  • Ecology: 1 theory paper. It involved data.
  • Oikos: 1 theory paper. But note that I didn’t count the Jan. 2015 issue of Oikos, a special issue devoted to a single topic, which included several theory papers.
  • JAE: 1 theory paper.
  • J. Ecol.: 1 theory paper, and it’s actually an evolution paper.

The conclusions as I see them:

  1. Evolution journals really do publish a lot more theory than ecology journals. This is striking to me. They’re sister fields with a lot in common–but they differ greatly in this respect.
  2. The difference is even starker if you restrict attention to theory papers lacking any data. Only 2/21 theory papers in evolution journals involved data; 3/9 theory papers in ecology journals involved data.
  3. The differences would be starker still if you didn’t count the two theoretical evolution papers published in ecology journals.
  4. Am Nat is an outlier in terms of its mix of papers. Much more theory than any other journal on this list, but by no means exclusively theory. A mix of ecology and evolution. And an intermediate fraction of theory papers that involved data.

It’s interesting to speculate on the underlying reasons for these differences.** It would also be interesting to go back a decade or two (or more) and do the same exercise.*** But I think it’s most interesting to speculate on the consequences and implications of the differences.

For instance, which do you think is a symptom of a healthier theory-data interface in a scientific field: leading journals in the field publish both data papers and data-free theory, or leading journals in the field publish little theory, but the few theory papers they publish often include data? I’d probably stump for the former. After all, it’s not as if the data papers in Evolution and J. Evol. Biol. omit or ignore theory–many of them test theoretical predictions. As do papers in ecology journals, of course. So ecology journals are publishing a narrower range of stuff than evolution journals. That suggests to me a greater separation between the theoretical and empirical sides of the field.

This little exercise also reminded me what a precious outlier Am Nat is. I know that some empirically-minded folks think of Am Nat as a theory journal, but I don’t think they should. I think Am Nat fulfills a pretty vital role as the one leading ecology journal where theory and data papers both show up on a consistent basis. Because like it or not, publication venues do still matter. They reflect as well as shape our choices of where to publish and (crucially) what to read. As long as Am Nat exists, it means that empirical and theoretical ecologists still have something in common, because they’re still reading and publishing in the same shared space.

*Yes, of course there are lots of other journals I could’ve looked at. You get the background research you pay for on this blog. 🙂 I didn’t consciously choose journals so as to try to skew things one way or the other, I just haphazardly picked the first few journals that occurred to me. But if you think my results are way off because I didn’t look at Functional Ecology or Ecography or whatever, by all means go compile the data and share it in the comments!

**Presumably part of the reason is the perception on the part of many ecologists that ecology journals other than Am Nat only want to publish “realistic” theory.

***But I’m too lazy to bother. 🙂

Is it really that important to prevent and correct one-off honest errors in scientific papers?

Wanted to highlight what I think has been a very useful discussion in the comments, because I know many readers don’t read the comments.

Yesterday, Brian noted that mistakes are inevitable in science (it’s a great post, BTW-go read it if you haven’t yet). Which raises the question of how hard to work to prevent mistakes, and correct them when they occur. After all, there’s no free lunch; opportunity costs are ubiquitous. Time, money, and effort you spend checking for and correcting errors is time, money, and effort you could spend doing something else.* I asked this question in the comments, and Brian quite sensibly replied that the more serious the consequences of an error, the more important it is to prevent it:

Certainly in the software engineering world it is widely recognized that it is a lot of work to eliminate errors and that there are trade-offs. If it is the program running a pace-maker it is expected to do just about everything to eliminate errors. But for more mundane programs (e.g. OS X, Word) it is recognized that perfection is too costly.

Which raises the sobering thought that the vast majority of errors in scientific papers aren’t worth putting any effort into detecting or correcting. At least, not any more effort than we already put in. From another comment of mine:

Yes, the consequences of an error must be key here. Which raises the sobering thought that most errors in scientific papers aren’t worth checking for or eliminating! After all, a substantial fraction of papers are never cited, and only a tiny fraction have any appreciable influence even on their own subfield or contribute in any appreciable way to any policy decision or other application.

xkcd once made fun of people who are determined to correct others who are “wrong on the internet” ( It’s funny not just because it’s mostly futile to correct the errors of people who are wrong on the internet, but because it’s mostly not worth the effort to do so. [Maybe] most (not all!) one-off errors in scientific papers are like people who are “wrong on the internet”…

What worries me much more are systematic errors afflicting science as a whole, that arise even when individual scientists do their jobs well–zombie ideas and all that.

Curious to hear what folks think of this. Carl Boettiger has already chimed in in the comments, suggesting that my point here is the real argument for sharing data and code. The real reason for sharing data and code is not so that we can detect and correct isolated, one-off errors.** Rather, we share data and code because:

Arguing that individual researchers do more error checking than they already do is both counter to existing incentives and can only slow science down; sharing speeds things up. I love Brian’s thesis here that we need to acknowledge that humans make mistakes. Because publishing code or data makes it easier for others to discover mistakes, it is often cited in anonymous surveys as a major reason researchers don’t share; myself included. Most of this will still be ignored, just as most open source software projects are; but it helps ensure that the really interesting and significant ideas get worked over and refined and debugged into robust pillars of our discipline, and makes it harder for an idea to be both systemic and wrong.

I’m not sure I agree that sharing data and code makes it harder for an idea to be both systemic and wrong. The zombie ideas of which I’m aware in ecology didn’t establish themselves because of lack of data and code sharing. But I like Carl’s general line of thought, I think he’s asking the right questions.

*A small example from my own lab: We count protists live in water samples under a binocular microscope. Summer students who are learning this procedure invariably are very slow at first. They spend a loooong time looking at every sample, terrified of missing any protists that might be there. Which results in them spending lots of wasted time staring at samples that are either empty, or in which they already counted all the protists. Eventually, they learn to speed up, trading off a very slightly increased possibility of missing the occasional protist (a minor error that wouldn’t substantially alter our results) for the sake of counting many more samples. This allows us to conduct experiments with many more treatments and replicates than would otherwise be possible. Which of course guards against other sorts of errors–the errors you make by overinterpreting an experiment that lacks all the treatments you’d ideally want, and the errors you make because you lack statistical power. I think people often forget this–going out of your way to guard against one sort of error often increases the likelihood of other errors. Unfortunately, the same thing is true in other contexts.

**I wonder if a lot of the current push to share your data and code so that others can catch errors in your data and code is a case of looking under the streetlight. It’s now much easier than it used to be to share data and code, so we do more of it and come to care more about what we can accomplish by doing it. Which isn’t a bad thing; it’s a good thing on balance. But like any good thing it has its downsides.

Not an April Fool’s joke: PI success rates at NSF are not dropping (much) (CORRECTED and UPDATED)

If you’ve saw this post in the first few hours after it went up, there’ve since been some major updates and corrections!


The title of this post is not a joke (I’ll cop to deliberate provocation…), but it does require some explanation.

My inspiration is this old comment from Chris Klausmeier:

[W]hat I’d really like to know is the success rate per PI, not per grant. That is, are there more people fighting for a constant amount of dollars leading to an increasing “unfunded rate”, or are there roughly the same number of people dividing up the same amount of dollars but with increasing number of proposals per PI?

That’s a great question. After all, if you’re worried about things like the ability of PIs to establish and maintain their labs, isn’t the per-PI success rate the most important one to look at? So I did what you do: googled “per PI sucess rate at NSF” (without the quotes).

I came up with NSF’s annual report on its merit review process from FY 2013 (probably the most recent FY available, I’m guessing). It includes data on per-PI success rates going back to 2001, along with some relevant contextual data. Thanks Google!*

The Cliff Notes** version:

  • The current per-PI success rate at NSF is 35% per 3 years. That is the percentage of PIs applying who get funded, calculated in 3-year moving windows (i.e. number of PIs who were awarded at least one grant anytime in that window, divided by the number who submitted at least one proposal anytime in that window). That percentage declined slightly from 41% in 2001-2003 to 36% in 2006-2008. It rose back to 40% in 2007-2009 and 2008-2010 thanks to stimulus funding. It then dropped back slightly to 35%, where it’s been since 2010-2012 (most recent window is 2011-2013). UPDATE: These data come from Fig. 14 in the linked report.

Now, to interpret that broad-brush answer, you need contextual information. The linked report has a bunch, but not everything you probably want to know***:

  • More PIs are submitting proposals. The number of PIs who submitted at least one proposal in 2011-2013 was 41% higher than in 2001-2003. UPDATE: This is from Fig. 14 in the linked report.
  • More proposals are being submitted: up 70% from 2001 to 2013 (UPDATE: Table 7 of the linked report). So the number of proposals is rising substantially faster than the number of PIs.
  • Per-proposal success rate is down from 2001, when it was 27%. But that number hasn’t really budged since 2005 (stimulus bump aside) and currently sits at 19% (UPDATE: Table 7 in the linked report).
  • UPDATE: Because this came up in the comments: data on mean and median size of research grants, in both nominal and real terms, are in Figs. 7 and 8. Both mean and median award sizes are increasing over time in nominal terms, with some ups and downs due to to things like stimulus funding (e.g., the median annual award size for research grants increased from ~$85,000 in 2002 to ~$130,000 in 2013). In real terms, mean and median award sizes are either roughly steady or increasing only slowly over time, with ups and downs (e.g., median annual award size in 2005 dollars was ~$90,000 in 2002 and ~$110,000 in 2013).
  • All of these numbers are for full proposals. An appendix presents data on the fraction of pre-proposals for which full proposals were invited, for those NSF units that have binding pre-proposals (it was ~25% in both 2012 and 2013). So per-proposal success rates are higher than they would be if you included pre-proposals. And it’s possible that per-PI funding rates would drop if you included pre-proposals, since it’s possible that some unsuccessful PIs have never been invited to submit full proposals.
  • Multi-PI proposals, where the PIs are from different institutions, are counted multiple times (see here). That certainly distorts the picture of per-proposal funding rates, since in reality a multi-PI proposal is a single proposal. But it doesn’t distort per-PI funding rates. If you’re a PI on a multi-PI proposal, and the proposal gets funded, you and the other PIs all get funded (ok, you probably don’t all get as much funding as if you’d all written successful single-PI proposals, but that’s a different question). As an aside, single-PI awards continue to outnumber multiple-PI awards, but the gap is slowly closing (UPDATE: Fig. 9 in the linked report). Single-PI proposals also have slightly higher success rates than multi-PI proposals, and those rates haven’t budged much over time (UPDATE: Fig. 11 in the linked report), so presumably the increasing proportion of multi-PI awards reflects an increasing proportion of multi-PI proposals.
  • These numbers are NSF-wide, they’re not specific to the NSF divisions (DEB and IOS) to which ecologists mostly apply. Which interacts with the previous two bullets, because the divisions to which ecologists mostly apply are the ones that brought in pre-proposals for their core programs in 2012, and because I think (?) DEB and IOS tend to receive a higher proportion of multi-PI proposals than some other NSF divisions.
  • These numbers include all categories of proposals, which I believe means they include things like DDIGs, conference support, and REU supplements (again, see here). Many of those categories have higher success rates than core research programs. Now, most of those categories involve small numbers of proposals and PIs, so won’t affect the per-PI funding rate too much. But DDIGs are more numerous. CORRECTION: The per-PI success rate data in Fig. 14 in the linked report are per-PI success rates for research grants. “Research grants” is a critical term here. This includes “typical” panel and mail-reviewed grants as well as EAGER and RAPID awards. (Aside: footnote 23 of the report notes that EAGER and RAPID awards have high success rates, but are only 1.4% of all proposals) The per-PI success rate quoted above does not include DDIGs, REU supplements, conference support, fellowships, equipment grants, Small Grants for Exploratory Research, most things funded by the education directorate, or big-ticket items like NEON construction. Similarly, all of the contextual data given above for number of PIs applying, per-proposal success rates, mean/median award size, are for research grants only. (Aside: some other figures and tables in the report do include stuff besides research grants, under the broader category of “competitive awards/actions”) Thank you to a correspondent from NSF for correcting me on this, and apologies for the error. In retrospect, I was reading too quickly–I missed the bit on the top of p. 19 in the linked report where NSF explains all this, clear as day.
  • There are no data provided on how often PIs apply, what their other funding sources are, or what type of institution they’re based at. When per-PI success rates are calculated, somebody who submits one proposal in 3 years and is unsuccessful counts the same as somebody who submits several proposals in 3 years and is unsuccessful on all of them. So you can’t tell from the data if, say, some of the growth in PIs and proposals is coming from people who don’t ordinarily seek NSF funding, or haven’t in the past, but who for whatever reason have decided to take an occasional crack at it.

Some take-home thoughts:

  • I think the 35% number quoted above likely is an upper bound on the current per-PI success rate over 3 years for faculty PIs seeking grants from the core programs of DEB and IOS. But I’m not dead certain.
  • I’m very surprised that the per-PI success rate hasn’t dropped much since 2001-2003, even though the number of PIs has increased 41%. Anecdotally, it had been my impression from reading social media that on a per-PI basis NSF funding had suddenly gotten much more difficult to obtain just in the past few years (not that it was easy to obtain before in an absolute sense). But if so, that doesn’t show up in these data, and I’m sure NSF’s numbers are correct. Now, you can probably tell a story about why a recent crash in the per-PI success rate wouldn’t show up in these data–but it’s not as easy as you might think. I’ve been trying and failing to come up with one (am I just being dense?)
  • More proposals per PI seems like a problem to me, since it suggests a sort of tragedy of the commons or Red Queen phenomenon–all these people writing more proposals just to keep up with all these people writing more proposals. All that time spent writing and reviewing proposals presumably could be spent doing something else, like science. And there’s of course all the stress and pressure boiled frogs PIs feel, which I suspect correlates more with per-proposal success rates (though in fairness, it’s not NSF’s job to make PIs feel happy…) All of which seems like a pretty good argument for limiting the number of proposals/PI/year, as noted by Chris in another comment. Indeed, DEB and IOS now cap pre-proposals/PI/year. Since the same rules apply to everyone, a cap on proposals/PI/year won’t affect the per-PI funding rate, and so shouldn’t reduce any PI’s chances of establishing or maintaining a research program.

You tell me, US colleagues–what do you think of these per-PI numbers? Are you surprised at how little they’ve changed over time? Do you find them encouraging or depressing? Useful or too hard to interpret? And what implications, if any, do you think the per-PI data have for issues like whether NSF should reduce average grant size or limit the number of active grants PIs can hold at once?****

p.s. The report also breaks down the data by self-reported gender, ethnicity, and disability status of PI. I encourage you to click through and read the report for details, but from my admittedly-quick skim the overall picture on this front is mostly (not entirely) a mix of good news and news that’s trending in the right direction. For instance, female PIs are if anything funded at very slightly higher rates than male PIs, and are submitting an increasing fraction of proposals. Having said that, to really interpret these numbers effectively so as to identify the root causes of any disparities, I think you’d want more contextual information than NSF provides (or could reasonably be expected to provide in this sort of report). Contextual information is really important.

*Actually, first of all thanks NSF!

**My god, that reference dates me, doesn’t it?

***Understandably. The purpose of the report is to summarize NSF’s merit review process for the National Science Board, not to allow individual PIs to estimate their own odds of obtaining NSF funding with high precision.

****Re: limiting the number of active grants PIs can hold at once in order to free up money for other PIs, the linked report has relevant data on that. In particular, the large majority of PIs with at least one grant just have the one, and very few have more than two. Now, some of the PIs with single grants are holders of things like DDIGs. But still, it’s possible that, if you ran the numbers, you might find that capping the number of active grants a PI could hold wouldn’t free up much money.

Hoisted from the comments: on ecological ideas and their champions

Some big ideas in ecology are closely identified with their champions–individuals who were instrumental in developing and pursuing the idea and getting others to pay attention. In an old comment, Jim Grace suggests Grime’s CSR theory and related hump-backed model of diversity-productivity relationships, and Huston’s DEB model, as examples. Ratio dependent predation is closely identified with Roger Arditi and Lev Ginzburg. R* theory is closely identified with Dave Tilman. The metabolic theory of ecology is closely identified with Jim Brown, Brian Enquist, and Geoff West. There are other examples.

Other big ideas aren’t identified with any one individual. For instance, eco-evolutionary dynamics is hot right now, but not because of the efforts of any one individual. Interest in biodiversity and ecosystem function didn’t take off because of a single dedicated champion. And some ideas are identified with one individual champion when perhaps they shouldn’t be. Neutral theory in ecology is widely identified with Steve Hubbell, who certainly has done a lot to develop and promote the idea. But Graham Bell and Hal Caswell developed very similar ideas, that for whatever reason never took off like Hubbell’s. And while the Price equation rightly bears George Price’s name, its currently-growing popularity owes little to Price himself. Price died shortly after publishing the equation that now bears his name, and widespread interest in it didn’t take off until decades later.

Some ideas outgrow and outlive their champions. Evolution by natural selection only took off because a few of Darwin’s friends pushed the idea in a coordinated way. But now there’s an entire self-sustaining field of evolutionary biology that’s far bigger than any one person. I’d say R* theory has outgrown Tilman at this point. After all, it’s in the textbooks now. But conversely, I don’t think, say, ratio dependent predation has outgrown its champions.

Conversely, some ideas die with their champions. Immanuel Velikovsky’s ideas are a good (pseudo-scientific) example.

An intermediate case is an idea that outlives its original champion, but persists mainly through the efforts of former students and postdocs of the original champion. I suspect there’s a continuous gradient from ideas that are only ever taken seriously by a single individual (think Steven Wolfram’s ideas about cellular automata as the foundation of all science), to ideas that start with a single individual but grow far beyond them and their academic “descendants” (think Darwinian evolution).

It’s tempting to infer that, if an idea only persists thanks to the ongoing efforts of a dedicated champion or narrow lineage of champions, then there must be something wrong with the idea. Whereas if lots of people take up the idea independently, that means the idea is a good one. There’s definitely something to that inference. But there are exceptions. Again, recall that when evolution by natural selection was first proposed, it got a foothold thanks to the tireless efforts of a small number of dedicated champions. Does that mean it was a bad idea? Or even that it would’ve been reasonable at the time to dismiss it as a bad idea? Indeed, I suspect that, in order for an idea to get an initial foothold, and thus have a chance of being more widely noticed and taken up, it often needs a dedicated champion (not always, of course). Conversely, just because lots of people take up an idea independently doesn’t necessarily mean the idea is a good one.

Hoisted from the comments: what do ecologists have Big Data on, and what don’t they?

It’s often said that we’re in, or will soon enter, the era of Big Data. We’ll have all the data we could possibly want, and so we’ll no longer be data-limited. Instead, the rate of scientific progress will be limited by other factors, like our ability to think of good questions.

But as Jeremy Yoder and David Hembry asked in the comment thread on this old post: what sorts of Big Data do ecologists (and evolutionary biologists) actually have? We certainly don’t have Big Data on everything–whatever that might mean! Rather, we have Big Data on certain things on which technological advances have made it easy to collect data. Gene sequences, for instance. Records of where and when species have been observed, thanks to things like camera trap networks, citizen science projects and smartphone apps, and digitization of museum records. Information that can be remotely sensed, like land cover. Probably other sorts of data I’m forgetting.

What don’t we have Big Data on, even though we really wish we did? What data that we would really like to have has not gotten any easier to obtain thanks to smartphones, satellites, drones, cheap PCR, citizen science, etc.? I’d say demographic data is a big one. Data on the births and deaths (and for mobile organisms, movements) of lots of individuals. Ideally along with relevant environmental data sampled at the spatial and temporal grains and extents relevant to those individuals. And it’d sure be nice to have this information for many generations, but of course there’s no way for technology to speed that up.* And to have it for many different species, so that we could do community ecology and not just population ecology.**

Here’s another sort of Big Data we mostly don’t have: data from controlled, manipulative, randomized experiments. A lot of Big Data is observational data. Which is great. But no matter how much observational data you have, on whatever variables you have it on, inferring causality without experimental data is going to be difficult at best. The great thing about NutNet is that it’s Big Experimental Data. Not that technological advances are irrelevant for NutNet–the internet facilitates collaboration, for instance. But information technology doesn’t make it any easier to fence plots or add fertilizer or remove a species of interest or etc.

So, what do you think are the biggest and most difficult-to-close gaps in ecologists’ collective data collection efforts?

Hat tip to Peter Adler, who got me thinking about this.

*For this reason, I wonder if there will be a long-term trend for ecologists to focus more on spatial variation and less on temporal variation. Technological advances can improve the spatial extent of our sampling, but not the temporal extent.

**And as long as I’m dreaming, I’d like my free pony to be a palomino.

What sort of papers win the Mercer Award?

The deadline for submitting nominations for the Ecological Society of America’s various awards is Dec. 15. Details of the awards and how to submit nominations are here. Nominating someone for an award is a great way to honor deserving work and the people who did it, and to give a career boost to less-senior people. It’s also a way to shape the direction of the field, since the winning individuals and projects attract attention and influence. Anyone can submit a nomination and in my experience all nominations are taken seriously, even those from very junior people. For instance, back when I was a postdoc I nominated a Mercer Award winner.

Scientific awards also are interesting as a window into what sort of work is held in highest esteem by whoever is giving the award. So I thought it would be interesting to look back and see what sort of papers have won the ESA’s Mercer Award over the past 20 years.*

For reference, here’s a list of all the Mercer Award winners. You wouldn’t necessarily expect any common threads, since ecology a whole, the makeup of the awards committee, and the identities of those submitting nominations all change over time. But I do think many of the award winners over the last 20 years share some features. I emphasize that the following impressions are mostly based on memory or in a few cases a skim of the abstract, so take them with a grain of salt and please let me know of any mistakes in the comments.

  • They’re really good papers! Which isn’t something we should take for granted. Think of how major awards in other fields (the Oscars, the Golden Glove awards in major league baseball) often go to bad picks. Or think of how you sometimes hear people say that everything published in leading journals (as Mercer Award winners invariably are) is oversold, trendy rubbish. Not so with the Mercer Award winners.
  • Remember Meg’s post on the power of combining a diversity of approaches? Well, that appears to be the best way to win the Mercer award (and Meg should know since she won it!) That’s the common thread that jumps out at me: the winners are mostly papers that combine different approaches and lines of evidence. They often have extensive field observations documenting some pattern, plus experimental data testing alternative causal hypotheses about the processes generating that pattern. Many also have a mathematical model demonstrating that all the various kinds of data are in fact quantitatively consistent with one another, meaning that the “story” actually works as opposed to merely being plausible. Not uncommonly, those mathematical models are tailored to the system and partially or completely parameterized from independent data as opposed to being curve-fitted.
  • There are a few exceptions from the last 20 years, papers that relied on one or two approaches. Jon Chase and Shahid Naeem won for mesocosm experiments. Lars Hedin won for a comparative observational study. Jean Richardson won for a phylogenetic comparative analysis. Dan Bolnick won for a review paper, which is kind of a category unto itself. And a couple of folks (Brian Enquist, Jordi Bascompte) won for papers developing a mathematical model and then comparing its predictions to observational data. In general, I think the list of Mercer Award winners reinforces the point that lots of approaches can lead to great ecology, but none are essential. Indeed, even “combining different approaches” isn’t absolutely essential.
  • The winning papers usually have strong links to big, generally-applicable ideas, but also really nail what’s going on in some specific system. You don’t win the Mercer by developing some general theory and then waving your arms about how it kinda, sorta works if you squint at the data. Conversely, you don’t often win the Mercer merely by nailing what’s going on in one particular system.
  • Winning papers usually cover all the bases–they’re mostly very convincing rather than merely being plausible or suggestive. For instance, they typically don’t just test some hypothesis or prediction. Usually, they also check whether the assumptions underpinning that hypothesis actually hold, thereby showing, rather than merely inferring, that their hypothesis holds for the right reasons. That’s why winning papers often combine different approaches: the kind of data you need to check an assumption often is totally different than the kind of data you need to test a prediction.
  • You mostly don’t win the Mercer just for developing hypotheses. Nobody in the last 20 years has won with a pure modeling paper, for instance. Bolnick et al. is the closest anyone’s come as best I can tell–it was a review paper, but it won in large part because it suggested new ideas and productive lines of research for others to pursue. On the other hand, you don’t win the Mercer without having hypotheses either–nobody in the last 20 years has won for a purely descriptive study, or for discovering some intriguing phenomenon they can’t explain. You mostly win the Mercer for testing hypotheses, including hypotheses developed by others.**
  • Nobody wins the Mercer by following a “recipe” that anyone can apply in their own system. Most of the Mercer Award winners are quite creative in terms of how they address the question asked. Nobody’s won the Mercer for, say, testing neutrality by fitting alternative models to species-abundance distributions, or for testing “habitat filtering” vs. “limiting similarity” by looking at how closely related co-occurring species are, or by plotting local vs. regional species richness to infer whether local communities are “saturated” with species (to pick three examples of popular “recipes” that many authors have followed in recent years). This is related to my old post on how techniques are less powerful than the people applying them, and to this old post on how “recipes” for inferring process from pattern hardly ever work.
  • Most winners are single individuals, members of the same lab group, or small collaborations. Though I expect that to change at some point (at the risk of jinxing them, I think the NutNet project has a good shot at the Mercer one of these years).

*I also thought this would be an easy post to write, because it requires only slightly updating a comment I wrote on an old post.* You get the effort level you pay for on this blog. But I throw in all the laziness you want for free! 🙂

**Perhaps because when you test a hypothesis your paper reads as a complete “story”, even if a key part of the story–the hypothesis–was “written” by others. Whereas if you only develop a hypothesis, the “story” seems incomplete–it has a beginning, but not an ending. Of course, just because that’s the way things are doesn’t mean they should be that way. For instance, think of how the Nobel Prize in physics often is split between theoreticians who predicted something, and experimentalists who tested that prediction. Obviously, the Nobel Prize in physics is quite different than the Mercer Award in that it’s not usually given for a single paper. But even so, I think you can argue that the Mercer Award should sometimes go to papers that develop hypotheses but don’t test them, since it does sometimes go to papers that test hypotheses but don’t develop them.