About Brian McGill

I am a macroecologist at the University of Maine. I study how human-caused global change (especially global warming and land cover change) affect communities, biodiversity and our global ecology.

Detection probability survey results

Last week, I highlighted some new results from a paper on detection probabilities and placed detection probabilities in the context of estimator theory. This in turn led to a a reader poll to try to get a sense of how people thought about experimental design with detection issues.

Although I don’t want to spend too much time on it here, I wanted to briefly highlight a great paper that just came out “Assessing the utility of statistical adjustments for imperfect detection in tropical conservation science” by Cristina Banks-Leite and colleagues. They look at several real world scenarios focused on identifying covariates of occupancy (rather than absolute occupancy levels) and show the results are not much different with or without statistical adjustment. They draw a distinction between a priori control for covariates of detection probability in setting up a good study design vs a posteriori statistical control for detection probability and point out that both are valid ways of dealing with detection issues. The take home quote for me was “We do not believe that hard-won field data, often on rare specialist species, should be uniformly discarded to accord with statistical models”. Whereas my last post was very theoretical/statistical this paper is very grounded in real-world, on-the ground conservation, but in many ways makes many of the same points. It is definitely worth a read.

Turning now to the survey … at the time of analysis Wednesday morning there were 168 respondents. You can view the raw results here. There was a reasonably good cross section of career stages and organisms represented although the employment sector skewed very heavily to university. And of course “readers of a blog who chose to respond to a poll” is in no way a scientifically designed sample. If I had to speculate this particular post attracted a lot of people interested in detection probabilities, but what exact bias that would result in is hard to predict.

Recall I presented two scenarios. Scenario A was to visit 150 sites once. Scenario B was to visit 50 sites 3 times each. The goal was to estimate how occupancy varied with four collinear environmental variables.

Probably the lead result is the recommended scenario:

detprob_advice

Scenario B (50 sites 3 times) was the most common recommendation but it by no means dominated. Over 10% went for scenario A outright. And 20% noted that choosing required more information – with most people saying the critical information was more knowledge about the species – well represented in this quote on what the choice would depend on: “A priori expectation of potential for detection bias, based on species biology and survey method.”. It should be noted that a non-trivial fraction of those who went for B did it not to support detection probabilities but for reasons of sampling across temporal variability (a goal that is contradictory with detection probability modelling which assumes constant conditions and even constant individuals across the repeat visits). 17% also went for B but with hesitation (either putting statistical expertise of others over their own field intuition or else feeling it was necessary to publish).

There was a trend (but definitely not statistically significant) for more graduate students to recommend B and more senior career people (while still favoring B) to switch to “it depends”. Similarly there was a non-significant trend for people who worked on vertebrates to favor B and for people who worked on plants and inverts to switch a bit to scenario A (with scenario B still a majority).

Quite a few people argued for a mixed strategy. One suggestion was to visit 100 sites with 2 repeat visits to 25 of them. Another suggested visiting 25 sites 3 times, then making a decision how to proceed. And there were quite a few variations along this line.

The story for my question about whether there was pressure or political correctness to use detection probabilities was similar (not surprisingly). There was a weak trend to yes (mean score of 3.09) but not significant (p=0.24). Graduate students were the most likely to think there was PC-ness and senior career people the least likely. People working in verts and plants were more likely to see PC-ness than people working on inverts (again all non-significant).

So the overall pattern is a lean to scenario B but a lot of diversity, complexity and nuance. And not much if any perception of PC-ness around having to use detection probabilities ON AVERAGE (some individuals felt rather strongly about this in both directions).

In short, I think a majority of respondents would have agreed with this quote from one respondent:  “… the most important part of study design is…thinking. Each situation is different and needs to be addressed as a unique challenge that may or may not require approaches that differ from those used in similar studies.” Which nicely echoes the emphasis in this blog on the need to think and not just apply black and white universal rules for statistics and study design.

Detection probabilities – back to the big picture … and a poll

I have now had two posts (both rather heavily read and rather contentiously debated in the comments) on detection probabilities (first post, second post). Whether you have or haven’t read those posts, they were fairly technical (although my goal was to explain technical issues in an accessible way).

Here I want to pull way back up to 10,000 feet and think about the boots on the ground implications. And for a change of pace, I’m not going to argue a viewpoint. I just am going to present a scenario (one I see every semester, one that I know students all over the world face from conversations when I travel) and ask readers via a poll what they would advise this student.

So you are on the committee of a graduate student. This student’s project is to study the species Debatus interminus which may be a candidate for threatened listing (little is really known). The primary goals are: 1) to assess overall occupancy levels of D. interminus and 2) to figure out how occupancy varies with four variables (vegetation height, canopy closure, soil moisture, and presence of its one known predator, Thinking clearus). Obviously these four variables are moderately collinear. Given resources, length of project, accessibility of sites, that the student is the only person able to visit the sites, etc you calculate the student can do exactly 150 visits. Various members of the committee have advised the student that she/he should:

  • Scenario A – identify 150 sites across the landscape and visit each site 1 time, then estimate ψ (occupancy), and do a simple logistic regression to give β, a vector of regression coefficients  for how ψ varies with your four variables across 150 sites.
  • Scenario B – identify 50 sites across the landscape and visit each site 3 times, then develop a simple hierarchical model of detection proabilities so you will estimate ψ (occupancy), p (detection probability), and β, a vector of regression coefficients in a logistic regression for how ψ varies with your four variables at 50 sites.

Would you advise the student to follow scenario A or B? And why? Please take our poll (should take less than 5 minutes). I am really curious what our readership will say (and I care about this poll enough that I’ve taken the time to do it in Google polls so I can cross tab the answers with basic demographics – but don’t worry your anonymity is ensured!)

Depending on level of interest I’ll either post the results in the comments or as a separate post after a few days.

And – as everybody knows – a poll in a blog is not a scientific sample, but it can still be interesting.

Detection probabilities, statistical machismo, and estimator theory

Detection probabilities are a statistical method using repeated sampling of the same site combined with hierarchical statistical models to estimate the true occupancy of a site*. See here for a detailed explanation including formulas.

Statistical machismo, as I define it in this blog, is the pushing of complex statistical methods (e.g. reviewers requiring the use of a method, authors claiming their paper is better solely because of the use of a complex method) when the gains are small or even occur at some cost. By the way, the opposite of statistical machismo is an inclusive approach that recognizes every method has trade-offs and there is no such thing as a best statistical method.

This post is a fairly technical statistical discussion .If you’re interested in detection probabilities but don’t want to follow the details, skip to the last section for my summary recommendations.

Background

I have claimed in the past that I think there is a lot of statistical machismo around detection probabilities these days. I cited some examples from my own experience where reviewers insisted that detection probabilities be used on data sets that had high value in their spatial and temporal coverage but for which detection probabilities were not possible (and even in some cases when I wasn’t even interested in occupancy). I also discussed a paper by Welsh, Lindenmayer and Donnelly (or WLD) which used simulations to show limitations of detection probability methods in estimating occupancy (clearly driven by their own frustrations of being on the receiving end of statistical machismo for their own ecological papers).

In July the detection probability proponents fired back at WLD with a rebuttal paper By Guillero-Arroita and four coauthors (hereafter GLMWM). Several people have asked me what I think about this paper including some comments on my earlier blog post (I think usually in the same way one approaches a Red Sox fan and asks them about the Yankees – mostly hoping for an entertaining reaction).

The original WLD paper basically claimed that in a number of real world scenarios, just ignoring detection probabilities gave a better estimator of occupancy. Three real-world scenarios they invoked were: a) when the software had a hard time finding the best fit detection probability model, b) a scenario with moderate occupancy (Ψ=40%) and moderate detection probabilities (about p=50%), and c) a scenario where detection probabilities depend on abundance (which they obviously do). In each of these cases they showed, using Mean Squared Error (or MSE, see here for a definition), that using simple logistic regression only of occupancy ignoring detection probabilities had better behavior (lower MSE).

GLMWM basically pick different scenarios (higher occupancy Ψ=80%, lower detection p=20% and a different SAD for abundances) and show that detection probability models have a lower MSE. They also argue extensively that software problems finding best fits are not that big a problem**. This is not really a deeply informative debate. It is basically,” I can find a case where your method sucks. Oh yeah, well, I can find a case where your method sucks.”

Trying to make sense of the opposing views

But I do think  stepping back, thinking a little deeper, framing this debate in the appropriate technical context – the concept of estimation theory, and pulling out a really great appendix in GLMWM that unfortunately barely got addressed in their main paper, a lot of progress can be made.

First, lets think about the two cases where each works well. Ignoring detection worked well when detection probability, p, was high (50%). It worked poorly when p was very low (20%). This is just not surprising. When detection is good you can ignore it, when it is bad you err to ignore it! Now WLD did go a little further, they didn’t just say that you can get away with ignoring detection probability at a high p – they actually showed you get a better result than if you don’t ignore it. That might at first glance seem a bit surprising – surely the more complex model should do better? Well, actually no. The big problem with the detection probability model is identifability – separating out occupancy from detection. What one actually observes is Ψ*p (i.e. that % of sites will have an observed individual). So how do you go from observing Ψ*p to estimating Ψ (and p in the case of the detection model)? Well ignoring p is just the same as taking  Ψ*p as your estimate. I’ll return to the issues with this in a minute. But in the detection probability model you are trying to disentangle Ψ vs. p just from observed % of sites with very little additional information (the fact that observations are repeated on a site). Without this additional information  Ψ*p are completely unseparable – you cannot do better than randomly pick some combination of  Ψ and p and that together multiple to give the % of sites observed (and again the non-detection model essentially does this by assuming p=1 so it will be really wrong when p=0.2 but only a bit wrong p=0.8). The problem for the detection model is that if you only have two or three repeat observations at a site and p is high, then most sites where the species is actually present it will show up at  all two or three observations (and of course not at all when it is not present). So you will end up with observations of mostly 0/0/0 or 1/1/1 at a given site. This does not help differentiate (identify)  Ψ from p at all. Thus it is actually completely predictable that detection models shine when p is low and ignoring detection shines when p is high.

Now what to make of the fact, something that GLMWM make much of, that just using Ψ*p as an estimate for Ψ is always wrong anytime p<1. Well, they are correct about it always being wrong. In fact using the observed % of sites present (Ψ*p) as an estimator for Ψ is wrong in a specific way known as bias. Ψ*p is a biased estimator of Ψ. Recall that bias is when the estimate consistently overshoots or undershoots the true answer. Here Ψ*p consistently undershoots the real answer by a very precise amount Ψ*(1-p)  (so by 0.2 when Ψ=40%  and p=50%). Surely this must be a fatal flaw to intentionally choose an approach that you know on average is always wrong? Actually, no, it is well known in statistics that sometimes biased estimator are the best estimator (by criteria like MSE).

Estimation theory

Pay attention here – this is the pivotal point – a good estimator has two properties – it’s on average close to right (low bias), and the spread of its guesses (i.e. the variance of the estimate over many different samples of the data) is small (low variance). And in most real world examples there is a tradeoff between bias and variance! More accurate on average (less bias) means more variance in the guesses (more variance)!  In a few special cases you can pick an estimator that has both the lowest bias and the lowest variance. But anytime there is a trade-off you have to look at the nature of the trade-off to minimize MSE (best overall estimator by at least one criteria). (Since  Mean Squared Error or MSE=Bias^2+Variance one can actually minimize MSE if one knows the trade-off between bias and variance).This is the bias/variance trade-off to a statistician (Jeremy has given Friday links to posts on this topic by Gelman).

Bias and Variance - Here estimator A is biased (average guess is off the true value) but it has low variance. Estimator B has zero bias (average guess is exactly on the true value) but the variance is larger. In such a case Estimator B can (and in this example does) have a larger Mean Squared Error (MSE) - a metric of overall goodness of an estimator.

Figure 1 – Bias and Variance - Here estimator A is biased (average guess is off the true value) but it has low variance. Estimator B has zero bias (average guess is exactly on the true value) but the variance is larger. In such a case Estimator B can (and in this example does) have a larger Mean Squared Error (MSE) – a metric of overall goodness of an estimator. This can happen because MSE depends on both bias and variance – specifically MSE=Bias^2+Variance.

This is exactly why the WLD ignore detection probabilities method (which GLMWM somewhat disparagingly call the naive method) can have a lower Mean Square Error (MSE) than using detection probabilities despite always being biased (starting from behind if you will). Detection probabilities have zero bias and non-detection methods have bias, but in some scenarios, non-detection methods have so much lower variance than detection methods that the overall MSE is better to ignore the detection method. Not so naive after all! Or in other words, being unbiased isn’t everything. Having low variance (known in statistics as an efficient estimator) is also important. Both the bias of ignoring detection probabilities (labelled “naive” by GLMWM) and the higher variances of the detection methods can easily be seen in Figures 2 and 3 of GLMWM.

When does ignoring detection probabilities give a lower MSE than using them?

OK – so we dove into enough estimation theory to understand that both WLD and GLMWM are correct in the scenarios they chose (and that the authors of both papers were probably smart enough to pick in advance a scenario that would make their side look good). Where does this leave the question most readers will care about most – “should I use detection probabilities or not?”  Well the appendix to GLMWM is actually exceptionally useful (although it would have been more useful if they bothered to discuss it!) – specifically supplemental material tables S2.1 and S2.2.

Let’s start with S2.1. This shows the MSE (remember low is good) of the ignore detection model in the top half and the MSE of the use the deteciton model in the bottom half for different samples sizes S, repeat visits K, and values of Ψ and p. They color code the cases red when ignore beats use detection, and green when detection beats ignore (and no color when they are too close to call). Many of the differences are small, but some are gigantic in either direction (e.g. for Ψ=0.2, p=0.2, ignoring detection has an MSE of 0.025 – a really accurate estimator – while using detection probabilities has an MSE of 0.536 – a really bad estimate given Ψ ranges only from 0-1, but similar discrepancies can be found in the opposite direction too). The first thing to note is that at smaller sample sizes the red, green and no color regions are all pretty equal! IE ignoring or using detection probabilities is a tossup! Flip a coin!  But we can do better than that. When Ψ (occupancy) is < 50% ignore wins, when Ψ>50%, use detection wins, and when p (detection rate) is high, say>60% then it doesn’t matter. In short, the contrasting results between WLD and GLMWM are general! Going a little further, we can see that when sample sizes (S but especially number of repeat visits K) creep up, then using detection probabilities starts to win much more often which also makes sense – more complicated models always win when you have enough data, but don’t necessarily (and here don’t) win when you don’t have enough data.

Bias, Variance and Confidence Intervals

Figure 2 – Figure 1 with confidence intervals added

Now lets look at table S2.2. This is looking at something that we haven’t talked about yet. Namely, most estimators have, for a given set of data, a guess about how much variance they have. This is basically the confidence interval in Figure 2. In Figure 2, Estimator A is a better estimator of the true value (it is biased, but the variance is low so MSE is much lower), but Estimator A is over confident – it reports a confidence interval (estimate of variance) that is much smaller than reality. Estimator B is a worse estimator, but it is at least honest – it has really large variance and it reports a really large confidence interval. Table S2.2 in GLMWM shows that ignoring detection probabilities is often too cocky – the reported confidence intervals are too small (which has nothing to do with and in no way changes that ignoring detection probabilities is in many case still a better or equally good estimator of the mean – the conclusion from table S2.1). But using detection probabilities is just right – not too cocky, not too pessimistic – it’s confidence intervals are very accurate – when there’s a lot of variance, it knows it! In short Figure 2  is a good representation of reality over a large chunk of parameter space where method A is ignore detection (and has lower MSE on the estimate for Ψ but over-confident confidence intervals) and method B is use detection-based methods (and has worse MSE for the estimation of Ψ but has very accurate confidence intervals)..

(As a side-note, this closely parallels the situation for ignoring vs statistically treating spatial, temporal and phylogenetic autocorrelation. In that case both estimators are unbiased . In principal the variance of the methods treating autocorrelation should be lower, although in practice they can have larger variance when bad estimates of autocorrrelation occur so they are both roughly equally good estimators of the regression coefficients. But the methods ignoring autocorrelation are always over-confident – their reported confidence intervals are too small.)

So which is better – a low MSE (metric of how good at guessing the mean) or an honest, not cocky estimator that tells you when its got big error bars? Well in some regions you don’t have to choose  using detection probabilities is a better estimator of the mean by MSE and you get good confidence intervals. But in other regions – especially when Ψ and p are low you have to pick – there is a tradeoff – more honesty gets you worse estimates of the occupancy. Ouch! That’s statistics for you. No easy obvious choice. You have to think! You have to reject statistical machismo!

Summary and recommendations

Let me summarize four facts that emerge across the WLD and GLMWM papers:

  1. Ignoring detection probabilities (sensu WLD) can give an estimate of occupancy that is better (1/3 of parameter space), as good as (1/3 of parameter space) or worse than (1/3 of parameter space) estimates using hierarchical detection probability models in terms of estimating the actual occupancy. Specifically, ignoring detection guarantees bias, but may result in sufficiently reduced variance to give an improved MSE.These results come from well-known proponents of using detection probabilities using a well-known package (unmarked in R), so they’re hard to argue with. More precisely, ignoring detection works best when Ψ is low (<50%) and p is low, using detection works best when Ψ is high (>50%) and p is low, and both work very well (and roughly equally well) when p is high (roughly when p>50% and certainly when p>80%) rgardless of Ψ.
  2. Ignoring detection probabilities leads to overconfidence (reported confidence intervals that are too small) except when p is high (say >70%). This is a statement about confidence intervals. It does not affect the actual point estimate of occupancy which is described by #1 above.
  3. As data size gets very large (e.g. 4-5 repeat visits of 165 sites) detection probability models general get noticeably better – the results in #1 mostly apply at smaller, but in my opinion more typically found, sample sizes (55 sites, 2 repeat visits).

And one thing talked about a lot which we don’t really know yet:

  1. Both WLD and GLMWM talk about whether working with detection probabilities requires larger samples than ignoring detection probabilities. Ignoring detection probabilities allows  Ψ to be estimated with only single visits to a site, while hierarchical detection probabilities requires a minimum of 2 and as GLMWM shows really shines most with 3 or 4 repeat visits. To keep a level playing field both WLD and GLMWM reports results where the non-detection approach uses the repeat visits too (it just makes less use of the information by collapsing all visits into either species seen at least once or never seen). Otherwise you would be comparing a model with more data to a model with less data which isn’t fair. However, nobody has really full evaluated the real trade-off – 50 sites visited 3 times with detection probabilities vs 150 sites visited once with no detection probabilities. And in particular nobody has really visited this in a general way across the whole parameter space  for the real-world case where the interest is not in estimating  Ψ, the occupancy, but the β’s or coefficients in a logistic regression of how Ψ varies with environmental covariates (like vegetation height, food abundance, predator abundance, degree of human impact, etc). My intuition tells me that with 4-5 covariates that are realistically covarying (e.g. correlations of 0.3-0.7) getting 150 independent measures of the covariates will outweigh the benefits of 3 replicates of 50 sites (again especially for accurate estimation of the β’s) but to my knowledge this has never been measured. The question of whether estimating detection probabilities requires more data (site visits) remains unaswered by WLD and GLMWM but badly needs to be answered (hint: free paper idea here).

So with these 3 facts and one fact remaining unknown, what can we say?

  1. Detection probabilities are not an uber method that strictly dominates ignoring them. As first found by WLD and now clearly shown to be general in the appendices of GLMWM, there are fairly large regions of parameter space where the primary focus – the estimate of Ψ – is more accurate if one ignores detection probabilities! This is news the detection probably machismo-ists probably don’t want you to know (which could be an explanation for why  it is never discussed in GLMWM).
  2. Detection probabilities clearly give better estimates of their certainty (or in a lot of cases uncertainty) – i.e. the variance of the estimates.
  3. If you’re designing data collection (i.e. estimating # of sites vs # visits/site before you’ve taken measurements – e.g. visit 150 sites once or 50 sites 3 times), I would recommend something like the following decision tree:
    1. Do you care more about the estimate of error (confidence intervals)  than the error the estimate (accuracy of Ψ)? If yes then use detection probabilities (unless p is high).
    2. If you care more about accuracy of Ψ, do you have a pretty good guess that Ψ much less or much greater than 50% or that p is much greater than 70%? If so then you should use detection probabilities if Ψ is much greater than 50% and p less than or equal to 50-60%, but ignore them if Ψ much less than 50% or p clearly greater than 50-60%.
    3. If you care more about accuracy of Ψ and don’t have a good idea in advance of roughly what Ψ or p will be, then you have really entered a zone of judgement call where you have to weigh the benefits of more sites visited vs. more repeat visits (or hope somebody answers my question #4 above soon!).
    4. And always, always if you’re interested in abundance or species richness, don’t let somebody bully you into switching over to occupancy because of the “superiority” of detection models (which as we’ve seen is not even always superior at occupancy). Both the abundance and species richness fields have other well established methods (e.g. indices of abundance, rarefaction and extrapolation) for dealing with non-detection.
    5. Similarly, if you have a fantastic dataset (e.g. a long term monitoring dataset) set up before detection probabilities became fashionable (i.e. no repeat visits) don’t let the enormous benefits of long term (and perhaps large spatial scale) data get lost just because you can’t use detection probabilities. As we’ve seen detection probabilities are (a good method, but also a flawed method which is clearly outperformed in some cases just like every other method in statistics. They are not so perfect that they mandate throwing away good data.

The debate over detection probabilities have generated a lot more heat and smoke than light, and there are clearly some very machismo types out there, but I feel like if you read carefully between the lines and into the appendices, we have learned some things about when to use detection probabilities and when not to. The question #4 still remains a major open question just begging for a truly balanced, even-handed assessment. What do you think? Do you use detection probabilities in your work? Do you use them because you think they’re a good idea or because you fear you can’t get your paper published without them? Has your opinion changed with this blog?

 


*I’m aware there are other kinds of detection probabilities (e.g. distance based) and that what I’m really talking about here are hierarchical detection probabilities – I’m just trying to keep the terminology from getting too thick.

**Although I have to say I found it very ironic that the software code GLMWM provided in an appendix, which uses the R package unmarked, arguably the dominant detection probability estimation software,  apparently had enough problems finding optima that they rerun each estimation problem 10 times from different starting points – a pretty sure sign that optima are not easy to find.

25 years of ecology – what’s changed?

I am giving/gave a talk this morning at a Festschrift celebrating the 25th anniversary of the Graduate Ecology program at the Universidade Federal de Minas Gerais (UFMG), the large state university in one of the larger states/cities in Brazil. So first congratulations to the program and many thinks to the organizers (especially Marco Mello, Adriano Paglia and Geraldo Fernandes) for inviting and hosting me.

I was invited to give the talk based on my blogging, which is sort of a new trendy thing in ecology. So I foolishly offered to give a perspective on the past 25 years of ecology and what the next 25 years of ecology will contain, because I like to think about such things. But as I prepared my slides I increasingly got nervous because these are topics no one person should claim expertise on!

However, I did come up with a couple of data-driven graphics that I thought readers might find interesting.

Publication trends

First I did some statistics on rates of publishing by country (using Web of Science so biased to English journals). I picked out the US, several European Countries and Brazil and China. What would you guess the trends are? First, the total # of papers published per decade is increasing at a phenomenal rate, so everybody is publishing more. But as a percent of published papers, most European countries are holding steady (although some countries like Germany started to publish in English later than other countries like Sweden so they show a big increase in the 1980s or 1990s), the US is slowly declining and China and Brazil are increasing rapidly.

Total ecology papers published per decade

 

According to Web of Science which is English journal-biased. RoW is rest of world.

According to Web of Science which is English journal-biased. RoW is rest of world.

 

Research topic trends

Secondly, and more interesting to me, I did a Wordle on the titles of the top 200 cited papers in 1989 and the top 200 cited papers in 2012 (yes it is 2014 but I found I had to go back to 2012 to get citations that had settled down to papers that were truly the top instead of just the ones published in January).

The two Wordles are for 1989:

 

Word cloud for titles of top 200 cited papers in 1989 (click for a full size image)

Word cloud for titles of top 200 cited papers in 1989 (click for a full size image)

And 2012:

Top 200 for 2012

Top 200 for 2012 (click for full size image)

There are some obvious differences. But before I comment, I am curious to see what you all see (that is the point of a word cloud after all)  I hope you all will share your thoughts on what has or has not changed in 25 years (OK 23). I’ll try and add my thoughts in the comments after others have had a chance at analysis.

 

PS – if you’re curious you can download my slides for the talk from figshare. The first 1/3 matches what you read above. The last 2/3 mostly matches themes I’ve hit on before here in my posts on DE. Although students might enjoy the next to last slide on advice to students.

 

Poll: What should a community ecology class cover?

This fall I will be teaching a graduate-level community ecology class for the first time. Most people would say that community ecology is one of the five or so main subdisciplines of ecology along with physiological ecology, population ecology, ecosystem ecology and maybe behavioral ecology.

In the 1970s community ecology was an “in” field. Then in the 1980s and 1990s my perspective is that community ecology was passe. I started graduate school in 1997 and I well remember how all my graduate student peers would say things like “I study species interactions” rather than use the phrase “community ecology”. Now community ecology feels very much like a reinvigorated, “cool” field again, but in part because the lines have blurred with topics like macroecology and global change ecology.

So it has been an interesting exercise for me to think through what exactly should be covered in a community ecology class. Its a bit of a definitional exercise in defining what I think community ecology is today. There is definitely more than enough material to fill a semester these days, so choices must be made. There are two great textbooks on community ecology by Mittelbach and Morin (both reviewed by Jeremy). So I can look at the tables of contents there, but there are some noticeable differences from the choices I will make.

So I thought it would be fun to take a reader survey to see what topics people think belong in an early graduate (e.g. first year graduate student) community ecology class.There are 30+ topics. Each topic could easily take 1 week to cover (in fact could easily be an entire semester seminar), and here at Maine we typically have a 15 week semester, so assuming we’ll squeeze a few topics together, you can pick up to 20 topics (it would be no fun if you could check everything!). I’m sure there are other ways to organize/slice&dice these topics, but this is a reasonable approximation. What would you prioritize in a community ecology class? What are your top 20 priorities for an introductory graduate level community ecology class? Take our poll (NB: I have NOT randomized the order presented to keep related topics close to each other, but please make sure you read to the end and don’t just bias towards the first things you see):

 

Four rules for long distance collaborations

One trend of ecology, science, and life generally is increasingly doing work with people who are not physically in the same location.

Some examples of collaborating remotely that are part of my academic life include:

  1. Students in different locations – some of these have been my fault (i.e. I moved to a new university and left behind students I was supervising and needed to find a way to continue supervising). Some of these have been the student’s requirements (often involving spousal or SO constraints). (See Yi Han’s post on Yvonne Buckley’s website for another discussion of remotely advising students)
  2. Working groups – although the whole point of a working group is to get people together in one place, working groups invariably demand working remotely a good chunk of the time too. I have a post planned for the near future on how to make a successful working group, but one piece is certainly just the generic problem of collaborating remotely.
  3. Collaborations assembled for reasons of complementary expertise among people in different locations to do research. One of my best and most productive collaborations right now is with two people in Scotland and one in Vermont. Aside from student/adviser type papers, it is getting increasingly rare  these days to see multi-author papers where all the authors are at the same university or the same city these days.

It is claimed that technology makes us “one world”. I’m pretty sure this is overhype on the part of the technologists :-) But it is true that skype and equivalents, drop box and equivalents, google docs and equivalents, etc have made things possible that weren’t possible even in the days of telephone and email. Although even there, I remember a project 20 years ago where a co-worker and I were porting a complex (1 million lines of code) product to a new operating system (Windows NT to date myself). I was in London, he was in Boston, but it was extremely efficient. Just as I was finishing I would email him where I was at and he could pick up just as his day was starting and all we needed was email and web-based source code management (and extremely rarely a telephone). But he was a close friend that I had worked with for years – we could practically anticipate each others next move.

Which brings me to what I think is the most important aspect of long distance collaborations. The technology has changed. But the social challenges have not changed and remain huge. Indeed, if I had to boil down my rules for long distance collaborations to just one sentence it is “Humans are still primates”. The social dynamics are extremely important and should not be ignored under the illusion that a collaboration relies merely on intellectual exchange of ideas which is easily solved by passive technology. Making a long-distance collaboration work requires a VERY ACTIVE attention to social maintenance. I can guarantee you things will sour quickly if this is ignored.

So although the following four rules are just elaborations on this point, here are my four rules of long distance collaboration:

  1. They have to start with a significant component of face to face time. I don’t think I’ve ever had a successful collaboration that begin and remained primarily on Skype. Beginnings are delicate, critical times, and face-to-face meetings are the key to success in these delicate beginnings. This is built into working-groups – indeed is the reason d’etre of places like NCEAS, sDiv, etc is to make the quality face-to-face time at the beginning of a collaboration happen. This also applies to working with students remotely – I refuse to do it if we can’t find a way to overlap in the same place for an extended period at the beginning (usually 1-2 years for Phd students, 3 monts for postdocs). My successful collaborations on papers also involve people I already know in person from working groups, repeated discussions at ESA, etc. While talking science during this early face-to-face time is useful, what is really important is establishing a rapport. Eating together, socializing together, cracking jokes together. Taking an adventure (be it to a scenic vista or a restaurant in a strange town) All these social trust building functions are what is most important. Rationally right brained people will scoff at this, but ignore this at your peril! Just be glad we’re hairless primates and don’t need to groom each other for lice to build social bonds!
  2. Schedule unstructured time – Beyond building social bonds and trust, another important feature of being in the same place is the occurrence of chance meetings that involve conversations that are not directed at a purpose. An obvious part of working remotely is talking by phone/skype/email to move the project forward. But if you only have these goal-oriented discussions, things will not go as well. Thus it is important to schedule time where you are “just talking” and conversations can meander and go to new (and hopefully exciting and innovative) places. Such unstructured time also leaves room for the occasional joke, how are the kids?, etc per #1. Being overly goal oriented on the phone/skype can kill a collaboration.
  3. Continue to make face-to-face meetings happen – Although #1 and #2 are the core ingredients, it is important in long-lasting collaborations to make sure that even if #1 and #2 happen additional face-to-face, same-location time happens. With remote students I try to make sure they spend at least a week per semester in the same building with me (and a month is better). With collaborators I try to get together at least once per year, sometimes only over dinner at ESA but often via multiple sesisons of a working group or even travelling to meet (I just spent 3 days in a random hotel in the middle of generic suburbia Connecticut as it was the best way to get four of us together).
  4. Make sure everybody has a quality work environment – This applies mostly to working with students or postdocs, but if they are not going to be in my lab, it is important that they have a productive work environment wherever they’re located. The idea that they’re going to work from home or from Starbucks is not a good idea. Students not in my lab all need to find a lab in a university where they are located so they have a desk and a weekly meeting with live, in-the-flesh people.

Those are my four core rules. I want to be clear that successful remote collaborations are relatively rare and hard. There are lots of studies that show that being next door is a lot better than being downstairs which is a lot better than being a couple of buildings over which is a lot better than being cross campus which is a lot better than being remote. As a physical setting, remote is at the bottom of the list. But there are times and circumstances when it can pay off (or where a collaboration you already invested in has to turn into a remote one). But in those times don’t kid yourself – you are starting the race behind and need to put extra energy into overcoming that deficit. Being highly proactive about #1-#4 in collaborations I care about is the formula I have learned over some successes and many failures (going all the way back into my business days).

What is your formula? Do you even participate in remote collaborations? If so what are the keys to making it work for you?

On the differences between natural resource and biology departments

Six weeks ago, in my post on research funding, in the comments several people noted that funding for TAs and RAs were different in natural resource departments than in ecology and evolutionary biology or biology departments. A reader Steven Byrd, emailed me asking me to expand on the perceived differences since he was about to make the switch moving from his masters in a biology department to his PhD in a natural resource department. I myself have jumped this divide nearly every move I’ve made – PhD in EEB department, Postdoc in Fish and Wildlife, tenure track at McGill in Biology, tenure track at Arizona in School of Natural Resources. Since many people like myself and Steven cross this divide or at least contemplate crossing this divide at least once in their career,  I thought it would be interesting to comment on the cultural differences I have observed and see what others think.

First a bit of background. This is specific to the US, but I know it is similar in Canada and believe it has parallels in Europe and Australia as well. Definitely curious to hear from our international readers. Most universities are organizied into departments nested inside of colleges nested inside the university. Ecology is typically found in two locations. One is in an EEB or Biology department inside of a College of Science (or on a smaller campus a College of Liberal Arts and Sciences). This college also has chemistry, physics and often some of atmospheric sciences, oceanography, geology, etc and is focused on pure research without focus on applications. The other is in the College of Agriculture where there are usually departments like Wildlife and Fisheries, Forestry, often Soils, Crop Science, Range Management, Hydrology and some others that overlap with ecology as well as things like plant sciences (plant breeding and pathology), animal husbandry, etc. The college of Ag is focused on applied questions, and in the US in land grant universities the college of Ag is naturally where the agricultural extension agents are homed. The college of Ag is also where federal cooperative units with the USGS (which has a mission of biological inventory and survey) and the US Department of Agriculture are homed – these units are employees of their respective federal agencies and are forbidden from teaching undergraduate classes but otherwise are rather regular members of departments doing research and having graduate students. In many campuses the forestry, wildlife, etc departments have been shrinking and have been merged into unified “natural resource” departments. These departments have also been undergoing a major transformation in recent decades from an emphasis on “hook and bullet” management of game animals for hunting and fishing to conservation of endangered species.

OK – so enough background. These departments all do ecology but if you’re contemplating a switch, what should you you know about the differences between the Biology/Ecology and Evolutionary Biology/College of Science and the Fish and Wildlife/Forestry/Natural resources/College of Agriculture world? (From here on I will abbreviate these two contrasts as EEB vs NatRes). The following are my own observations. They are general stereotypes based on the many departments I have visited and certainly do not apply to 100% of institutions, and in fact none of them apply to every place I’ve worked (and most of them don’t apply to my current place at U Maine which has several unique features with respect to this divide). But broadly speaking:

  • Research funding – EEB goes after NSF and maybe NASA or NIH. NatRes goes after USDA and an occasional NSF, but the majority comes from contract work for state and federal agencies (e.g. monitoring endangered species). As a result I think EEB tends to be a bit more boom-bust and (also divides people into have and have nots) while NatRes tends to be a bit more slow and steady.
  • Research topics – both sides are doing good ecology which is probably the most important point. But there are subtle differences. NatRes is more focused on collecting data and using sophisticated quantitative methods to make sense of the data. In EEB there is more of a split between pure field work and pure mathematical ecologists. In EEB there is also more of a focus on questions rather than information. Sometimes when I sit on NatRes committees I have to push students to ask questions that tie to theory (but many NatRes faculty are doing the same push), but sometimes when I sit on EEB committees I get bemused by how much handwaving there is about incorporating the latest trendy question (can you say phylo-spatial-functional trait coexistence?) without really thinking through the value of the work.
  • Reputational basis – evaluation for tenure and more generally for reputation is more mutlidimensional in NatRes. Papers and grants are still vitally important, but relationships with state and federal agencies, making a difference on the ground, outreach and education are all also important. EEB tends to be very one dimensional on papers and grants. For these reasons the pressure levels might be slightly lower in NatRes (although no tenure track job on the planet is absent of stress). Certainly I think people in EEB are more likely to know and talk about their h-index.
  • Relationships between departments – in general EEB tends to think they do better science and look down on NatRes. NatRes tends to think EEBers have their heads in the clouds and are irrelevant. For the record, I’ve seen places where from an objective outside view, NatRes is clearly the superior department and places where EEB is clearly the superior department and places where they’re both good, but they all still tend to adopt this attitude towards each other. Which is unfortunate, because despite the fact that in my opinion both groups are doing exactly what their mission mandates and there are enormous synergies, on most campuses these judgmental attitudes prevail and there is very little interaction between the two groups (and they are often physically separated by large distances).
  • Undergraduate curriculum – NatRes are training undergrads to get jobs in state and federal agencies. For students to be hired by these agencies, they must have taken a very specific set of courses so the whole curriculum is built around these national requirements. EEB tends to teach a lot of service courses (i.e. introductory biology, neurobiology, plant taxonomy) taken by people all over campus. The majority of undergrads majoring in Biology want to go into medicine/health sciences.
  • Graduate trajectory – in NatRes most students stop after a masters (again targeting jobs in state and federal agencies or maybe an NGO). If you want to get a PhD you usually need a masters first, preferably from another institution. In EEB – most students are doing a PhD, often without having gotten a masters first. Traditionally EEB departments see their graduate program as primarily for creating new professors, although I do think they are increasingly embracing the role of training people for conservation work as well.
  • Graduate funding – in EEB it is a mix of RAships from NSF grants and lots of TAships (coming from the service courses). In NatRes TAships are few and hard to come by so it is mostly work on the contracts with state agencies and any USDA grants. The TAships in EEB help to counter the boom-bust nature of pursuing NSF funding (i.e. provide backups when funding goes dry), so it can be very hard to have students in a NatRes department if you primarily pursue federal funding and don’t have a steady stream of state/federal contracts.
  • Internal departmental culture – EEB is much more bottom-up governed while NatRes is much more top-down governed. Both groups have regular faculty meetings and votes. But the opinion of the department chair (and in NatRes often an executive committee of 4-5 senior faculty) counts a lot more heavily, and I’ve seen people have heavy consequences from getting on the bad side of a department chair much more in NatRes – EEB is the stereotypical herding cats where everybody just shrugs their shoulders and expects some people to be prima donnas. Also I think it might be fair to say that the proportion of old white males is slightly higher in NatRes than EEB (although this is changing and nowhere in ecology does particularly well on race). I don’t know a nicer way to say this but some (and only some) NatRes departments still have more of a “good-old-boy club” feel. Some EEB departments might have more of an elitist attitude.
  • Relationships between the colleges – almost invariably the College of Agriculture is the second richest and most powerful college on campus (after the college of medicine if such exists). They always have new buildings, money floating around for various initiatives, etc. Within the college of agriculture, NatRes is usually near the bottom of the ladder. In contrast, while colleges of science are usually less powerful, EEB/Biology is often the biggest and richest department within the college (especially when its a joint Biology department with EEB and molecular/cellular biology). So NatRes tends to be the little fish in the big pond, while EEB tends to be the big fish in the small pond. There are advantages to both – mostly depending on whether resources are being allocated at the university level (e.g. buildings which favors college of ag) or at the within college level (e.g. various travel awards to students which can tend to favor EEB).
  • Interior decorating – by far the most important distinction is what the hallways look like!. EEB departments tend to be in generic university drab with perhaps a glass display case of books by the faculty or maybe something out of the collections. NatRes Have large stuffed mammals, often a bear, mounted upright in the wildlife half and to have gorgeous solid wood paneling on the forestry half.

Those are the differences that jump most immediately to my mind. As already stated they are sweeping stereotypes and the landscape will differ in individual units. My only goal here is to provide a “quick reference” for people contemplating the switch. Overall, I find it highly regrettable that these cultural differences exist and that people don’t work together better between these units. We are all doing ecology after all. And it makes me really appreciate the structure here at U Maine where all of the biological sciences (from EEB to nursing and food sciences to forestry) are in one college – effectively a college of biology. More universities should move in this direction. Maine is also a place where people aren’t very hung up on the basic-applied distinction – something else I wish more universities would foster

I fear that somebody will get annoyed by my putting this down in black and white, but my intention is to help people new to the issues. Keep in mind that these are only approximately true, and that I love – repeat love – my time spent in both types of units on multiple campuses and nearly always end up finding a way to have cross appointments or what not to effectively end up in the middle between the two which is where I am happiest.

What are your observations about the similarities and differences across the “divide” (which shouldn’t be as big a divide as it is)? How does this generalize in other countries? What about people at private universities or undergraduate education-focused universities in the US – which culture matches better to what you experience?

How to write a great journal article – act like a fiction author

There are a number of good posts out there on how to write a good journal article and even a whole blog devoted to the topic; many of them are linked to in the comments section of my post on writing style.

Here I want to elevate above the nuts-and-bolts sentence-level-detail of my post on writing style* and even elevate above the aforementioned posts that break down different sections of a paper and zoom out to 100,000 feet and think really strategically about writing a paper.

In my experience as a student committee member and as an associate editor for three journals I must have seen many 100s if not at this point 1000s of pre-publication articles. And they are varied. But many of them are by already good writers in the sense of clear, fluid English that understands well the purpose of each of the four sections (intro, methods, results, discussion). But many (most?) of these are still missing something. Something which I think is the hardest thing to learn: to think about the paper as a cohesive unit.

Think about an artistic painting. For the artist, it is made up of 100s or 1000s of individual brush strokes, each one of which requires skill and artistry. And of course a painting typically has a few key objects – a building, a lake, a person and the strokes have to make those up convincingly. But the reason an artist makes a painting, and the reason we hang paintings in the Louvre and visit them by the millions is none of those reasons. It is the overall gestalt effect – the message, the emotional impact. The sum of the parts is MUCH greater than the whole in a great piece of art.

It is no different with a paper. A day after reading it, you don’t remember well-crafted sentences or a really clear introduction – you just have an overall gestalt. With an academic paper this gestalt usually includes a one sentence summary of the factual content of the paper (and yes it is really only one sentence). But it also includes the emotions and judgments hanging on that one sentence. Is it convincing or weak? Is it elegant? Clever? Surprising? Ultimately, much of the emotional gestalt we take from a paper is was it convincing? do I trust the author? It is my experience that first-time writers and even many more experienced writers are so caught up in the mechanics (the sentences and sections in analogy to the brush strokes and objects in the painting) that they never think about the overall gestalt. And as a result the gestalt is rather poor. Which, fairly or not, reflects on the results of the paper. This of course is what distinguishes an art school student (working on mastering the details) from a great artist. And it is what distinguishes a publishable paper from a great paper, one that is remembered, one that has impact, and, dare we dream, a paper that will achieve the analog of being hung in the Louvre (whatever that might be – and no its not getting published in Science or Nature).

My main piece of advice will sound like it is tongue-in-cheek but it is in fact straight-up serious advice. Think and work like a fiction author! Wikipedia says that the main ingredients of fiction writing are: Character, Plot, Setting, Theme and Style. I’m sure there is debate, but these sound a lot like what I learned in high school, and I’m going to go with these.Notice that these are all unifying elements – they are things that cut across the introduction, middle, and ending/resolution of a fiction story. In short they are what give the gestalt.

Let me address each of these in a little more detail as they relate to non-fiction, scholarly article writing:

  • Character – in fiction the characters need to be richly drawn to draw you into the story and make you care enough to keep reading and to remember them. The characters in a journal article are the questions you are asking. Introduce us to them. Spend a little time fleshing out their nuances. This is not achieved by a dump of literature citations, although that is a piece of it. You need to sound excited by your questions (which means you need to know what they are!). And you need to make them 3-D. And you need to dwell on them lovingly. None of this by the way means that you should write a long introduction anymore than you should spend half your book introducing the characters. Just as in the best fiction, the characters (questions) should be introduced deftly and crisply, which requires work.
  • Theme – the take home message. In fiction it is a moral, or perhaps an emotion. In a journal article it is the one sentence take home message. You may think I’m joking, but most people really will take away only a single sentence summary of the paper, so you better know what you want it to be before you start writing. “Figuring it out as you write” is a terrible approach. Your paper will sound disjointed and like you didn’t know what your theme was before you started. So figure out your one sentence BEFORE you start writing. I am known in my lab group for mercilessly asking a student who is at the writing stage of a paper “what is your one sentence?”. I ask them before they start the presentation. I ask them immediately at the end of the presentation. And I ask them several more times during the discussion with the lab. It might seem impossible, but it is actually very achievable – it just requires setting this as an explicit task and spending some time (usually interactive with other people) to achieve it. It is a sine qua non for a paper that has a good gestalt. How can a fiction writer construct plot/story arc, characters, setting to all build towards a powerful theme if they don’t know what it is? No different in non-fiction.
  • Plot – a good piece of fiction has a clear sense of movement. It starts one place, gives a sense of motion at any point you are reading, and then you end up somewhere new. It’s a big part of why people keep reading to the end. I call this the story arc. And the story arc is the thing that I find most often missing in journal articles. You need to take the reader along a very clear trajectory from question to conclusion. Just having the standard four sections is nowhere near enough. So many papers organized by the four sections still sound like a dump of everything you ever thought or did in connection to the paper. You need to work hard on story arc to make sure everything in the paper is pulling towards that one arc. This is why figuring out your one sentence before you write is so important.This lets you know what is superfluous and unnecessary and trim it away (most good writers will tell you that half the battle is knowing what to delete).
  • Setting – the place and culture in which things happen. In field experiments or observations this is pretty simple. Just as I cannot begin to fully understand or relate to a character unless I know their context, I won’t really care if p<0.05** unless I can visualize the whole experiment in my mind. Almost everybody tells me that they used a 1m x 1m quadrat (or whatever their sample unit was) but many fail to tell me if their replicates are 5m apart or 1km apart. If they’re on the same topography or randomized, surrounded by the same vegetation, etc. A well drawn, information-packed diagram of the layout is something I often find myself requesting as a reviewer or editor..
  • Style – this is a broad category that covers everything from writing dialogue to what voice is used – but it is ultimately the techniques. The brush strokes. And it is the clear writing I posted on last year in a non-fiction article.

My bottom line is this. Every word, every sentence, every paragraph, every section of the paper should be working together, like a well-synchronized team of rowers all pulling towards one common goal. The introduction should introduce the questions in a way that gives them emotional pull and leaves us desperate to know the answer. The methods and results should be a page-turning path towards the answer. And the discussion should be your chance to remind the reader of the story arc you have taken them on and draw sweeping conclusions from it. Any freeloading sentence or paragraph that pulls in a different direction should be mercilessly jettisoned (or at least pushed to supplemental material). Does this sound like a novel you would want to read? Yes, it does, and it probably sounds like a journal article you would want to read too.

I wish more people saw themselves as needing to use the skills of a story teller when they write a journal article. I of course don’t mean the connotations of dissembling or making things up that the word “story” carries. But I do mean the art of story-telling that knows where it is going and does it crisply so that it sucks us in and carries us along with just the right amount of time spent on details of character and setting. Where the characters (questions), the plot (story arc), the setting, the theme (the one sentence take home message) all work together to make a cohesive whole that is greater than the sum of the parts. Like anything in writing, you can do it if you work at it, but you do have to work at it (writing is not a gift handed to you by the gods)***. So go ahead, turn your next manuscript into a cohesive whole with great characters and a compelling story arc that leaves us deeply moved.

UPDATE, 22 June 2014: Comments on this post are now closed. This post was highlighted on “Freshly Pressed“. Which is flattering, but has led to dozens of people who wouldn’t otherwise have seen our blog trying to make non-substantive comments in order to promote their own blogs. We may or may not reopen comments on this post in the future.


* (which I badly violated in this sentence by stringing 5 nouns and more connective words in a row with no verb in sight and then running on for 45+ words in one sentence! – do as I say, not as I do :) )

**I probably won’t care about p<0.05 for a whole other set of statistical/philosophical reasons, but I leave that for another day!

*** just as an example of the messy, iterative process that writing is which depends as much on the bas-relief process of what is removed as what is added, I had a clear vision for this post – science writing should be more like fiction writing with the same elements as a compelling story which immediatley led to a title and intro. Then I when I started writing, I ended up with an outline that looked like

I – you have to know your main point

II – you should be like a fiction writer

IIa – character

IIb – plot

IIc – theme

etc

Well – I clearly had lost my way. While nothing I said was untrue or unimportant, I had bifurcated and complexified off my main theme. This is something I am very prone to do (as I think are most academics). So I deleted two whole paragraphs on I – you have to know what you want to write about – and then worked a much reduced version of it into the IIc theme section. Boom – back to a single story arc, a single sentence to remember, and a tighter, stronger piece. Not every edit is this easy, and this post could certainly benefit from more, but I hope it at least makes my point that you have to edit with a mentality of “does this add or distract from my main point” and be merciless if the latter.

Frogs jump? researcher consensus on solutions for NSF declining accept rates

Dynamic Ecology’s readers have spoken in a clear voice! There is a clear consensus around what changes people favor to address the hopelessly declining grant award rates at NSF. In a post on Monday I described what I see as the long-term exogenous trends in our society (US specifically but as commenters noted probably largely applicable globally) that affect NSF. And that are putting in NSF in a tight squeeze leading to a current acceptance rate of 7.3% and every expectation it will go still much lower. Basically funding flat, many pressures on researchers to apply for more grants (both more applications from old hands and pressure on many to begin applying) lead to a trade-off, the only variables of which NSF controls is # of applications and grant size in $.

I had a reader poll on what choices readers would like to see NSF adopt. To be clear this poll is entirely unscientific in sample design – its whoever reads the blog and answers. It is presumably mostly academic ecologists, and our readership skews early career and male (although how much more it does so than academic ecology in general is unknown), but beyond that I couldn’t say what biases there are. There were 450 votes – I don’t know how many voters since each voter could vote up to 3 times and polldaddy doesn’t give me the details unless I pay them $200 (I know there are other choices – I’ll probably use them next time but polldaddy is so convenient from inside WordPress). But to a first approximation there were presumably about 160-175 voters (some voters likely voted for only 1 or 2 choices). The results as of 11:30AM EST Wednesday (in my experience the vast majority of people who will read the post have read it by now) are:

Results of survey on solutions declining accept rates at NSF

Results of survey on solutions for declining accept rates at NSF. Note since users could pick three choices the 450 votes probably maps to slightly more than 150 voters, perhaps 160-175 total voters, each picking 1-3 choices.

Basically there are three groups of answers. In the first group, nearly everybody who voted was in favor of two changes:

  1. Reduce the average grant size from the current $500K to something more modest ($200K was the example in the poll). This would immediately increase accept rates by 2.5x (last year’s 7.3% would have been 18.25%. That’s a pretty big difference. Several people noted cutting grant size would negatively affect graduate students (fewer RAships), faculty at institutions/departments without TAships, and postdocs. Presumably the choice for only a modest cut was partly driven by this. Personally I would take some of the money saved and put it directly into NSF predoc and postdoc fellowships (this money doesn’t come with indirects and so is more efficient and also tips the balance of power to the students which is desirable in my opinion).
  2. Limit the number of proposals by restricting multiple grants to one researcher in a fixed period. The example given in the main text was at most one grant per five year period (once you’ve been awarded you cannot apply again). There are of course devilish details – do coPIs count, SKP count (senior key personnel=people whos CV is submitted but no salary in the grant), etc. And 5 years from award date or end of grant? etc. And while there is no perfect solution – nearly every solution will unfairly penalize some deserving person – there are certainly multiple good solutions and this is not a reason to not implement this.

Again it is remarkable that nearly everybody who voted, voted for both of these options. These options together effectively amount for a vote to spread current funding around more widely. Also note that implementing #1 almost requires some version (possibly weaker than I proposed) of #2 or you just will compound the problem of more people submitting more applications to chase fewer dollars.

Three other choices were about evenly split. To a first approximation, almost everybody voted for the two choices above, and then split evenly among the following 3 choices with their 3rd vote. To wit:

  1. Reduce grant sizes even further to $50K (not the $200K from above). This would have allowed an acceptance rate of 73%. It would have also severely limited funding (after overhead it is about $35K so roughly 3 months of summer salary or 1 year of PHD or 1/2 year of postdoc). My guess is that the thinking here is that these grants would not mostly be used for such things and instead just cover the basics of fieldwork, travel to conferences, publishing papers, etc. In short not so different from the Canadian NSERC Discovery grant. To me it is striking that across choices #1 and #3 they got a combined 47% (recall 33%=everybody voted for it if everybody voted all 3 times). – presumably a non-trivial number of people felt so strongly about this they used 2 of their 3 choices to vote for reducing grant size.
  2. Limit number of proposals by only allowing “productive researchers” to submit – this of course begs the question of how you define productive researcher. I threw out the example in the main text of 15 papers published in the last 5 years. Like #2 above this will require an arbitrary definition that hurts some deserving people, but that alone is not a reason to avoid it – especially once the rules are clear people can manage their lives around the rules (and one could imagine exemptions for early career researchers, special circumstances, etc). One reason to like this option is that studies have shown that past research success is one of the best predictors of future research success (better for example than panel evalutions of projects).
  3. Limit number of proposals by a lottery – Again many details on how this works. Is there a lottery to apply? or just a lottery for the awards among those who applied? or just a lottery among qualified scientists however that is defined? Although the lottery seems absurd on the face of it, two recent studies cited in salient fact #2 of my original post suggest that, at least among those proposals ranked moderately high (30% in the DEB case), panel scores were not that different than a lottery in predicting research outcomes. Presumably this is true for some of those that were just below the 30% cuttoff and not true for the bottom 10-15% with the line somewhere in between. Thus the lottery has the great virtue of calling a spade a spade and removing stigma from losers in what currently has a large component of lottery already but cloakings of assessment.

Then there were two “no-hopers” – essentially nobody favored these choices:

  1. Business as usual – live with the low accept rates – this got only about 2% (perhaps 5-6% of voters), meaning about 95% of voters oppose business as usual with ever declining accept rates. In the metaphor of the original post, researchers are not frogs!!  In the original post and comments a number of problems in very low accept rates (beyond the fact it makes life tough for researchers) were identified including how it distorts the selection process (more conservative, more clique-driven and of course more random), the waste of time writing 15 page proposals (at least 1 month of researcher time) for 5% success, etc.
  2. Limit proposals to certain career stages – this was the absolute least favorite choice. We academics are an egalitarian bunch. It also is not obvious that any one stage is inherently more productive.

I said in my original post I would wait to share my opinions until the poll results were in to avoid driving the results. I’m sure my biases bled through in last post and this anyway, but hopefully not terribly. But personally, I agree with everybody else – I would be in favor of some combination of #1-#5 and opposed to #6 and #7. On the cutting grant size, I of course presented arbitrary discrete options of $50K or $200K, but to me the optimum would probably be about $100K*. Over 3 years that gives $22K of direct per year. That’s enough for field work (or computers equipment or what not for field), travel to conferences, publishing fees and some consummables each year with enough left over to give a bridge year to a student, a year to a postdoc, a year of tech etc. To make this viable, I would not put all of the savings into more grants (my $100K size gives an accept rate of 36.8% – I would aim for 20-25% accept rate and put the rest into more fellowships given directly to PhD and postdocs). The sublinear response of productivity/research outcomes to dollars input strongly argues we need to move down that curve to fewer dollars per researcher where the slope of the curve and hence marginal value of research productivity bought per dollar spent increases. By the same token, I think many feel, including, me that research dollars have gotten too concentrated in a few researcher’s hands (but I know of no data on this). There are good arguments for concentrating (see my post on Shockley and lognormal productivity), but really is a superstar lab with 18 students going to get more marginal value out of one more student than a very good lab that currently has 2-3 students? I doubt it.

I personally think #4 (limit by researcher quality) and #5 (limit by lottery) have more merit than people gave them credit for too, but they are more radical changes to the system.

It is worth noting that there is enormous consensus (at least among poll respondents) to reduce grant size non-trivially and put caps on number of grants per researcher. And these are things that NSF could, if it wanted to, implement immediately. No congress, no lengthy reform processes, etc would be needed. A year or two of appropriate advance notice to researchers would be good. But beyond that these are already within the purvey of program officers to adjust budgets and recall as a commentor did that a cap of max 2 proposals per PI was placed when the pre-proposals were introduced. It would probably require consensus across a unit to make the cap global and across multiple years, but that should be achievable. Finally, note that a single unit (say DEB just for example…) could implement these as an experiment while the rest of NSF watched to see how it worked (this already happened/is happening with the pre-proposal process too). Presumably the main dynamic opposing these changes are just innate conservatism/keep-it-like-it-is and lobbying by the few but powerful that are getting large chunks of money under the current system (although I would be curious to know how many of them really think the current system is optimal).

I think more meta-research is needed too. Just what can panels successfully assess or not? Although Sam Scheiner disagreed with me in the comments on my last post, I know of very little evidence that panels can do much more than distinguish the very worst proposals from the rest (please give my citations if you think I’m wrong). If that is true we need to be scientists and deal with it, not avoid doing the research to find out because the current system is comfortable. Kudo’s to Sam and Lynnette for their paper. Similarly the question of exactly how sublinear research productivity vs grant dollars is vitally important but not yet very clear.

I have no idea what the next step is, but it seems to me that the long term trends and outlook are so extreme something has to be done (only 5% favor business as usual). And there is such a strong consensus (nearly 100%, certainly *way* over 50%) on several concrete changes which would have big impacts but would not require major restructuring such that I would be disappointed to see nothing change over the next 3 years.

Here’s hoping the community can work together to find a way to turn down the heat on the pot we’re all in!


* I am not unaware that different subdisciplines cost different amounts to do research ($100K goes less far in ecosystem science work in the tropics than it does in simple trapping or counting experiments at a site close to home). The implications of this is a whole other topic, which I am not touching here. For this post if current DEB across all subprograms has a median of $500K then it can change to $100K with differences in funding between fields untouched.

 

Are US researchers slowly boiled frogs? – or thinking out of the box about the future of NSF

There is a belief that dropping a frog into hot water will cause it to react and immediately jump out, while putting it in a pan of cool water and slowly warming will cause the frog to never notice until it is boiled. Here in Maine you hear the same debate about how to cook a lobster. Whether the frog myth is true or not is debatable (although it is clearly sadistic). But it has become a common metaphor for failing to notice or respond to small incremental changes which when taken in the aggregate are terrible (fatal in the case of the frog). We seem to have a bit of the same thing happening with the primary basic science funding agency in the US (the National Science Foundation or NSF). In this piece I want to a) argue that due to macro trends not the fault of NSF, the agency and their researchers are in a frog-boiling scenario, and b) attempt to kick-start an out-of-the-box big picture discussion about what should be done about it (akin to the frog realizing it needs to take bold action and jump out of the pot).

But first, I’ve already said it, but let me repeat it to be abundantly clear. This is NOT a criticism of NSF. Every single program officer I’ve ever dealt with has been a highly dedicated and helpful professional (not to mention they are also researchers and one of us), and NSF regularly gets rated by government auditors as one of the most efficient and well run branches of the government. Instead, these trends are being driven by macro trends beyond the control of NSF (or of us researchers). I’m sure NSF is just as aware of and unhappy about these trends as I am. I expect they also are having discussions about what to do about it. I have not been privy to those discussions and have no idea whether NSF would welcome the discussion I am promoting here or not, but I feel like this blog, with its tradition of civility and rational thinking might be a useful forum.

Why researchers at NSF are like frogs being slowly boiled – the macro trends

I am going to focus just on the environmental biology division (DEB), although I don’t think the story differs much anywhere else. I haven’t always been able to obtain the data I would like to have, but I’m pretty confident that the big picture trends I am about to present are quite accurate even if details are slightly off. The core, graph, which I’ve seen in various versions of NSF presentations for a while (including those to justify the switch to the preproposal process) is this:

Trends in # of proposals submitted (green), # of proposals funded (blue), and success rate (red). This data is approximate (eyeball scanned from http://nsfdeb.wordpress.com/2013/03/11/deb-numbers-revisiting-performance-of-pi-demographic-groups-part-1/ provided by NSF). Linear trend lines were then added.

Trends in # of proposals submitted (green), # of proposals funded (blue), and success rate (red). This data is approximate (eyeball scanned from http://nsfdeb.wordpress.com/2013/03/11/deb-numbers-revisiting-performance-of-pi-demographic-groups-part-1/ provided by NSF). Linear trend lines were then added.

This graph confirms what NSF has been saying – the number of proposals submitted keeps going up without any sign of stopping while the number of proposals actually funded is flat (a function of NSF funding being flat – see below). The result is that the success rate (% of proposals funded) is dropping. But adding trends lends and extending them to 2020 is my own contribution. The trend in success rate is here actually an overestimate due to the stimulus year in 2009 which was left in. According to a naive, straight line trend, success rate will reach 0% somewhere between 2019 and 2020! Of course nobody believes it will reach 0% And the alternative approach combining the other two trend lines gives roughly 200 proposals funded out of 2000 for 10% in 2010. But the trend line is not doing a terrible job; when I plug in the 2013 number from DEB of 7.3%* it is not that far from the tend line (and is already below the 10% number). Nobody knows what the exact number will be, but I think you can make a pretty good case that 7.3% last year was on trend and the trend is going to continue going down. A few percent (2%?) by 2020 seems realistic. All of this is the result of inexorable logic. The core formula here is: TotalBudget$=NumberProposals*Accept%*GrantSize$

NumberProposals is increasingly rapidly. Although harder to come by data on, my sense is that GrantSize$ is roughly constant (at least after adjusting for inflation) with good spread but a median and mode right around $500,000. Maybe there is a saving grace in TotalBudget$? Nope:

nsf_funding_overview

Trends in NSF funding in constant 2012 dollars (data from http://dellweb.bfa.nsf.gov/NSFHist_constant.htm). Also see NSF’s own plot of the data at http://dellweb.bfa.nsf.gov/nsffundhist_files/frame.htm.

NSF appears to have had four phases – exponential growth in the early days (1950-1963), flat from 1963-1980. Strong growth from 1980 to about 2003. And then close to flat (actually 1.7%/year over inflation) from 2003-2013 (again a stimulus peak in 2009). Note that the growth periods were both bipartisan (as was the flat period from 1963-1980). Positive growth rates aren’t terrible and congratulations to NSF for achieving this in the current political climate. But when pitted against the doubling in NumberProposals, it might as well be zero growth for our purposes. It is a mug’s game to try to guess what will happen next, but most close observers of US politics expect since the debate has shifted to a partisan divide about whether to spend money at all and a resignation that the sequester is here to stay are not looking for big changes in research funding to come out of Congress anytime soon (see this editorial in Nature). So I am going to treat TotalBudget$ as flat line and beyond the control of NSF and researchers.

The number that probably deserves the most attention is NumberProposals. Why is this going up so quickly? I don’t know of hard data on this. There is obviously a self-reinforcing trend – if reject rates are high, I will submit more grant applications to be sure of getting a grant. But this only explains why the slope accelerates – it is not an explanation for why the initial trend is up. And there is certainly a red-queen effect. But in the end I suspect this is some combination of two factors: 1) the ever tighter job market (see this for a frightening graph on the ever widening gap between academic jobs and PhDs) which has led to ever higher expectations for tenure. To put it bluntly, places that 20 years ago didn’t/couldn’t expect grants from junior faculty to get tenure now can place that expectation because of the competition. and 2) as states bow out of the funding of their universities (and as private universities are still recovering from the stock crash), indirect money looks increasingly like a path out of financial difficulties. Obviously #1 (supply) and #2 (demand) for grant writing faculty reinforce each other.

So to summarize: TotalBudget$=NumberProposals*Accept%*GrantSize$. TotalBudget$ is more or less flat for the last decade and foreseeable future. NumberProposals is trending up at a good clip due to exogenous forces for the foreseeable future (barring some limits placed by NSF on number or proposals). So far GrantSize$ has been constant. This has meant Accept% is the only variable to counterbalance increasing NumberProposals. But Accept% is going to get ridiculously low in the very near future (if we’re not there already!). Part of the point of this post is maybe we need to put GrantSize$ and NumberProposals on the table too.

Some salient facts for a discussion of what to do

In the next section I will list some possible solutions, and hopefully readers will contribute more, but first I want to highlight two very salient results of metaresearch (research about research).

  1. Review panels are not very good at predicting which proposals will lead to the most successful outcomes. Some claim that review panels are at least good at separating good from bad at a coarse grain, although I am not even convinced of that. But two recent studies showed that panel rankings effectively have no predictive power of variables like number of papers, number of citations, citations of best paper! One study was done in the NIH cardiovascular panel and the other was done in our very own DEB Population and Evolutionary Processes panel by NSF program officers Sam Scheiner and Lynnette Bouchie. They found that the r2 between panel rank and various outcomes was between 0.01 and 0.10 (1-10% of variance explained) and were not significantly different than zero (and got worse when budget size, which was an outcome of ranking, was controlled for). UPDATE: as noted by author Sam Scheiner below in the comments – this applies only to the 30% of projects that were funded. Now traditional bibliometrics are not perfect but given that they looked at 3 metrics and impact factor was not one of them, I think the results are pretty robust.
  2. Research outcomes are sublinear with award size. Production does increase with award size, but best available (but still not conclusive) evidence from Fortin and Currie 2013 suggests that there are decreasing returns (a plot of research production vs. award size is an increasing, decelerating curve (e.g. like a Type II functional response).This means giving an extra $100,000 to somebody with a $1,000,000 buys less productivity increase then giving an extra $100,000 to somebody with $200,000 (or obviously to somebody with $0).

Possible solutions

Just to repeat this is not a criticism of NSF. The exogenous drivers are beyond anybody’s control and simple budgetary math drives the rest. There is no simple or obvious answer. I certainly don’t have the answer. I just want to enumerate possibilities.

  1.  Do nothing – low Accept% is OK – This is the business as usual scenario. Don’t make any drastic changes and just let the acceptance rate continue to drop to very close to zero. I actually think this might be the worst choice. Very low acceptance rates greatly increase the amount of randomness involved. They also ironically bias the panels to be conservative and select safe research (maybe even mediocre research) that won’t waste one of the precious awards, which is not good for the future of science. I recall being part of a discussion for an editorial board for a major journal where we all agreed the optimal accept rate was around 25-30%. Anything higher and you’re not selective. Anything lower and you start falling into traps of randomness and excessive caution. I think this is probably about the right number for grants too. Note that we are at about 1/4 of this rate. I personally don’t even consider the current acceptance rate of 7% acceptable but I cannot imagine anybody considers the rates of 1-2% that we’re headed towards to be acceptable. The other approaches all have problems too, but most of them are not as big as this one in my opinion.
  2. Drive down NumberProposals via applicant restrictions on career stage – You could only allow associate and full professors to apply on the basis they have the experience to make best use of the money. Alternatively you could only allow assistant professors to apply on the argument they are most cutting edge and most in need of establishing research programs. Arguably there is already a bias towards more senior researchers (although DEB numbers suggest not). But I don’t think this is a viable choice. You cannot tell an entire career stage they cannot get grants.
  3. Drive down NumberProposals via applicant restrictions on prior results - A number of studies have shown that nations that award grants based on personal records of the researcher do better than nations that award grants based on projects. You could limit those allowed to apply to those who have been productive in the recent past (15 papers in the last 5 years?). This of course biases against junior scientists although it places them all on an equal footing and gives them the power to become grant eligible. It probably also lops off the pressure from administrators in less research-intensive schools to start dreaming of a slice of the NSF indirect pie (while still allowing individual research productive researchers at those institutions to apply)..
  4. Drive down NumberProposals via lottery – Why not let the outcome be driven by random chance. This has the virtue of honesty (see fact #1 above).It also has the virtue of removing the stigma from not having a grant if people can’t be blamed for it. This would especially apply to tenure committees evaluating faculty by whether they have won the current, less acknowledged, NSF lottery
  5. Drive down NumberProposals via limitations on number of awarded grants (“sharing principals”) - You could also say that if you’ve had a grant in the last 5 years, you cannot apply again. This would lead to a more even distribution of funding across researchers.
  6. Decrease GrantSize$  – The one nobody wants to touch is maybe its time to stop giving out average grants of $500,000. Fact #2 strongly argues for this approach. Giving $50,000 to 10 people is almost guaranteed to go further than $500,000 to one person. It gets over that basic hump of having enough money to get into the field. It doesn’t have much room for summer salaries (or postdocs – postdoc funding would have to be addressed in a differnet fashion), but it would rapidly pump up the accept rate into reasonable levels and almost certainly buy more total research (and get universities to break their addiction to indirects). Note that this probably wouldn’t work alone without some other restriction on the number of grants one person can apply for, or everybody will just apply for 10x as many grants which would waste everybody’s time.

What do you think NSF should do? Vote by choosing up to three choices of how you think NSF should deal with the declining acceptance rates (and feel free to add more ideas in the comments):

I am really curious to see which approach(es) people prefer. I will save my own opinions for a comment after most votes have come in. But I definitely think it is time for the frogs (us) to jump out of the pot and take a different direction!


* Note that 7.3% is across all proposals to DEB. The blog post implies that the rates are lower on the core grants and higher on the non-core grants like OPUS, RCN, etc. They don’t give enough data to figure this out, but if I had to guess the core grants are funded a bit below 5% and the non-core grants are closer to 10%.