Detection probabilities are a statistical method using repeated sampling of the same site combined with hierarchical statistical models to estimate the true occupancy of a site*. See here for a detailed explanation including formulas.
Statistical machismo, as I define it in this blog, is the pushing of complex statistical methods (e.g. reviewers requiring the use of a method, authors claiming their paper is better solely because of the use of a complex method) when the gains are small or even occur at some cost. By the way, the opposite of statistical machismo is an inclusive approach that recognizes every method has trade-offs and there is no such thing as a best statistical method.
This post is a fairly technical statistical discussion .If you’re interested in detection probabilities but don’t want to follow the details, skip to the last section for my summary recommendations.
I have claimed in the past that I think there is a lot of statistical machismo around detection probabilities these days. I cited some examples from my own experience where reviewers insisted that detection probabilities be used on data sets that had high value in their spatial and temporal coverage but for which detection probabilities were not possible (and even in some cases when I wasn’t even interested in occupancy). I also discussed a paper by Welsh, Lindenmayer and Donnelly (or WLD) which used simulations to show limitations of detection probability methods in estimating occupancy (clearly driven by their own frustrations of being on the receiving end of statistical machismo for their own ecological papers).
In July the detection probability proponents fired back at WLD with a rebuttal paper By Guillero-Arroita and four coauthors (hereafter GLMWM). Several people have asked me what I think about this paper including some comments on my earlier blog post (I think usually in the same way one approaches a Red Sox fan and asks them about the Yankees – mostly hoping for an entertaining reaction).
The original WLD paper basically claimed that in a number of real world scenarios, just ignoring detection probabilities gave a better estimator of occupancy. Three real-world scenarios they invoked were: a) when the software had a hard time finding the best fit detection probability model, b) a scenario with moderate occupancy (Ψ=40%) and moderate detection probabilities (about p=50%), and c) a scenario where detection probabilities depend on abundance (which they obviously do). In each of these cases they showed, using Mean Squared Error (or MSE, see here for a definition), that using simple logistic regression only of occupancy ignoring detection probabilities had better behavior (lower MSE).
GLMWM basically pick different scenarios (higher occupancy Ψ=80%, lower detection p=20% and a different SAD for abundances) and show that detection probability models have a lower MSE. They also argue extensively that software problems finding best fits are not that big a problem**. This is not really a deeply informative debate. It is basically,” I can find a case where your method sucks. Oh yeah, well, I can find a case where your method sucks.”
Trying to make sense of the opposing views
But I do think stepping back, thinking a little deeper, framing this debate in the appropriate technical context – the concept of estimation theory, and pulling out a really great appendix in GLMWM that unfortunately barely got addressed in their main paper, a lot of progress can be made.
First, lets think about the two cases where each works well. Ignoring detection worked well when detection probability, p, was high (50%). It worked poorly when p was very low (20%). This is just not surprising. When detection is good you can ignore it, when it is bad you err to ignore it! Now WLD did go a little further, they didn’t just say that you can get away with ignoring detection probability at a high p – they actually showed you get a better result than if you don’t ignore it. That might at first glance seem a bit surprising – surely the more complex model should do better? Well, actually no. The big problem with the detection probability model is identifability – separating out occupancy from detection. What one actually observes is Ψ*p (i.e. that % of sites will have an observed individual). So how do you go from observing Ψ*p to estimating Ψ (and p in the case of the detection model)? Well ignoring p is just the same as taking Ψ*p as your estimate. I’ll return to the issues with this in a minute. But in the detection probability model you are trying to disentangle Ψ vs. p just from observed % of sites with very little additional information (the fact that observations are repeated on a site). Without this additional information Ψ*p are completely unseparable – you cannot do better than randomly pick some combination of Ψ and p and that together multiple to give the % of sites observed (and again the non-detection model essentially does this by assuming p=1 so it will be really wrong when p=0.2 but only a bit wrong p=0.8). The problem for the detection model is that if you only have two or three repeat observations at a site and p is high, then most sites where the species is actually present it will show up at all two or three observations (and of course not at all when it is not present). So you will end up with observations of mostly 0/0/0 or 1/1/1 at a given site. This does not help differentiate (identify) Ψ from p at all. Thus it is actually completely predictable that detection models shine when p is low and ignoring detection shines when p is high.
Now what to make of the fact, something that GLMWM make much of, that just using Ψ*p as an estimate for Ψ is always wrong anytime p<1. Well, they are correct about it always being wrong. In fact using the observed % of sites present (Ψ*p) as an estimator for Ψ is wrong in a specific way known as bias. Ψ*p is a biased estimator of Ψ. Recall that bias is when the estimate consistently overshoots or undershoots the true answer. Here Ψ*p consistently undershoots the real answer by a very precise amount Ψ*(1-p) (so by 0.2 when Ψ=40% and p=50%). Surely this must be a fatal flaw to intentionally choose an approach that you know on average is always wrong? Actually, no, it is well known in statistics that sometimes biased estimator are the best estimator (by criteria like MSE).
Pay attention here – this is the pivotal point – a good estimator has two properties – it’s on average close to right (low bias), and the spread of its guesses (i.e. the variance of the estimate over many different samples of the data) is small (low variance). And in most real world examples there is a tradeoff between bias and variance! More accurate on average (less bias) means more variance in the guesses (more variance)! In a few special cases you can pick an estimator that has both the lowest bias and the lowest variance. But anytime there is a trade-off you have to look at the nature of the trade-off to minimize MSE (best overall estimator by at least one criteria). (Since Mean Squared Error or MSE=Bias^2+Variance one can actually minimize MSE if one knows the trade-off between bias and variance).This is the bias/variance trade-off to a statistician (Jeremy has given Friday links to posts on this topic by Gelman).
Figure 1 – Bias and Variance - Here estimator A is biased (average guess is off the true value) but it has low variance. Estimator B has zero bias (average guess is exactly on the true value) but the variance is larger. In such a case Estimator B can (and in this example does) have a larger Mean Squared Error (MSE) – a metric of overall goodness of an estimator. This can happen because MSE depends on both bias and variance – specifically MSE=Bias^2+Variance.
This is exactly why the WLD ignore detection probabilities method (which GLMWM somewhat disparagingly call the naive method) can have a lower Mean Square Error (MSE) than using detection probabilities despite always being biased (starting from behind if you will). Detection probabilities have zero bias and non-detection methods have bias, but in some scenarios, non-detection methods have so much lower variance than detection methods that the overall MSE is better to ignore the detection method. Not so naive after all! Or in other words, being unbiased isn’t everything. Having low variance (known in statistics as an efficient estimator) is also important. Both the bias of ignoring detection probabilities (labelled “naive” by GLMWM) and the higher variances of the detection methods can easily be seen in Figures 2 and 3 of GLMWM.
When does ignoring detection probabilities give a lower MSE than using them?
OK – so we dove into enough estimation theory to understand that both WLD and GLMWM are correct in the scenarios they chose (and that the authors of both papers were probably smart enough to pick in advance a scenario that would make their side look good). Where does this leave the question most readers will care about most – “should I use detection probabilities or not?” Well the appendix to GLMWM is actually exceptionally useful (although it would have been more useful if they bothered to discuss it!) – specifically supplemental material tables S2.1 and S2.2.
Let’s start with S2.1. This shows the MSE (remember low is good) of the ignore detection model in the top half and the MSE of the use the deteciton model in the bottom half for different samples sizes S, repeat visits K, and values of Ψ and p. They color code the cases red when ignore beats use detection, and green when detection beats ignore (and no color when they are too close to call). Many of the differences are small, but some are gigantic in either direction (e.g. for Ψ=0.2, p=0.2, ignoring detection has an MSE of 0.025 – a really accurate estimator – while using detection probabilities has an MSE of 0.536 – a really bad estimate given Ψ ranges only from 0-1, but similar discrepancies can be found in the opposite direction too). The first thing to note is that at smaller sample sizes the red, green and no color regions are all pretty equal! IE ignoring or using detection probabilities is a tossup! Flip a coin! But we can do better than that. When Ψ (occupancy) is < 50% ignore wins, when Ψ>50%, use detection wins, and when p (detection rate) is high, say>60% then it doesn’t matter. In short, the contrasting results between WLD and GLMWM are general! Going a little further, we can see that when sample sizes (S but especially number of repeat visits K) creep up, then using detection probabilities starts to win much more often which also makes sense – more complicated models always win when you have enough data, but don’t necessarily (and here don’t) win when you don’t have enough data.
Figure 2 – Figure 1 with confidence intervals added
Now lets look at table S2.2. This is looking at something that we haven’t talked about yet. Namely, most estimators have, for a given set of data, a guess about how much variance they have. This is basically the confidence interval in Figure 2. In Figure 2, Estimator A is a better estimator of the true value (it is biased, but the variance is low so MSE is much lower), but Estimator A is over confident – it reports a confidence interval (estimate of variance) that is much smaller than reality. Estimator B is a worse estimator, but it is at least honest – it has really large variance and it reports a really large confidence interval. Table S2.2 in GLMWM shows that ignoring detection probabilities is often too cocky – the reported confidence intervals are too small (which has nothing to do with and in no way changes that ignoring detection probabilities is in many case still a better or equally good estimator of the mean – the conclusion from table S2.1). But using detection probabilities is just right – not too cocky, not too pessimistic – it’s confidence intervals are very accurate – when there’s a lot of variance, it knows it! In short Figure 2 is a good representation of reality over a large chunk of parameter space where method A is ignore detection (and has lower MSE on the estimate for Ψ but over-confident confidence intervals) and method B is use detection-based methods (and has worse MSE for the estimation of Ψ but has very accurate confidence intervals)..
(As a side-note, this closely parallels the situation for ignoring vs statistically treating spatial, temporal and phylogenetic autocorrelation. In that case both estimators are unbiased . In principal the variance of the methods treating autocorrelation should be lower, although in practice they can have larger variance when bad estimates of autocorrrelation occur so they are both roughly equally good estimators of the regression coefficients. But the methods ignoring autocorrelation are always over-confident – their reported confidence intervals are too small.)
So which is better – a low MSE (metric of how good at guessing the mean) or an honest, not cocky estimator that tells you when its got big error bars? Well in some regions you don’t have to choose using detection probabilities is a better estimator of the mean by MSE and you get good confidence intervals. But in other regions – especially when Ψ and p are low you have to pick – there is a tradeoff – more honesty gets you worse estimates of the occupancy. Ouch! That’s statistics for you. No easy obvious choice. You have to think! You have to reject statistical machismo!
Summary and recommendations
Let me summarize four facts that emerge across the WLD and GLMWM papers:
- Ignoring detection probabilities (sensu WLD) can give an estimate of occupancy that is better (1/3 of parameter space), as good as (1/3 of parameter space) or worse than (1/3 of parameter space) estimates using hierarchical detection probability models in terms of estimating the actual occupancy. Specifically, ignoring detection guarantees bias, but may result in sufficiently reduced variance to give an improved MSE.These results come from well-known proponents of using detection probabilities using a well-known package (unmarked in R), so they’re hard to argue with. More precisely, ignoring detection works best when Ψ is low (<50%) and p is low, using detection works best when Ψ is high (>50%) and p is low, and both work very well (and roughly equally well) when p is high (roughly when p>50% and certainly when p>80%) rgardless of Ψ.
- Ignoring detection probabilities leads to overconfidence (reported confidence intervals that are too small) except when p is high (say >70%). This is a statement about confidence intervals. It does not affect the actual point estimate of occupancy which is described by #1 above.
- As data size gets very large (e.g. 4-5 repeat visits of 165 sites) detection probability models general get noticeably better – the results in #1 mostly apply at smaller, but in my opinion more typically found, sample sizes (55 sites, 2 repeat visits).
And one thing talked about a lot which we don’t really know yet:
- Both WLD and GLMWM talk about whether working with detection probabilities requires larger samples than ignoring detection probabilities. Ignoring detection probabilities allows Ψ to be estimated with only single visits to a site, while hierarchical detection probabilities requires a minimum of 2 and as GLMWM shows really shines most with 3 or 4 repeat visits. To keep a level playing field both WLD and GLMWM reports results where the non-detection approach uses the repeat visits too (it just makes less use of the information by collapsing all visits into either species seen at least once or never seen). Otherwise you would be comparing a model with more data to a model with less data which isn’t fair. However, nobody has really full evaluated the real trade-off – 50 sites visited 3 times with detection probabilities vs 150 sites visited once with no detection probabilities. And in particular nobody has really visited this in a general way across the whole parameter space for the real-world case where the interest is not in estimating Ψ, the occupancy, but the β’s or coefficients in a logistic regression of how Ψ varies with environmental covariates (like vegetation height, food abundance, predator abundance, degree of human impact, etc). My intuition tells me that with 4-5 covariates that are realistically covarying (e.g. correlations of 0.3-0.7) getting 150 independent measures of the covariates will outweigh the benefits of 3 replicates of 50 sites (again especially for accurate estimation of the β’s) but to my knowledge this has never been measured. The question of whether estimating detection probabilities requires more data (site visits) remains unaswered by WLD and GLMWM but badly needs to be answered (hint: free paper idea here).
So with these 3 facts and one fact remaining unknown, what can we say?
- Detection probabilities are not an uber method that strictly dominates ignoring them. As first found by WLD and now clearly shown to be general in the appendices of GLMWM, there are fairly large regions of parameter space where the primary focus – the estimate of Ψ – is more accurate if one ignores detection probabilities! This is news the detection probably machismo-ists probably don’t want you to know (which could be an explanation for why it is never discussed in GLMWM).
- Detection probabilities clearly give better estimates of their certainty (or in a lot of cases uncertainty) – i.e. the variance of the estimates.
- If you’re designing data collection (i.e. estimating # of sites vs # visits/site before you’ve taken measurements – e.g. visit 150 sites once or 50 sites 3 times), I would recommend something like the following decision tree:
- Do you care more about the estimate of error (confidence intervals) than the error the estimate (accuracy of Ψ)? If yes then use detection probabilities (unless p is high).
- If you care more about accuracy of Ψ, do you have a pretty good guess that Ψ much less or much greater than 50% or that p is much greater than 70%? If so then you should use detection probabilities if Ψ is much greater than 50% and p less than or equal to 50-60%, but ignore them if Ψ much less than 50% or p clearly greater than 50-60%.
- If you care more about accuracy of Ψ and don’t have a good idea in advance of roughly what Ψ or p will be, then you have really entered a zone of judgement call where you have to weigh the benefits of more sites visited vs. more repeat visits (or hope somebody answers my question #4 above soon!).
- And always, always if you’re interested in abundance or species richness, don’t let somebody bully you into switching over to occupancy because of the “superiority” of detection models (which as we’ve seen is not even always superior at occupancy). Both the abundance and species richness fields have other well established methods (e.g. indices of abundance, rarefaction and extrapolation) for dealing with non-detection.
- Similarly, if you have a fantastic dataset (e.g. a long term monitoring dataset) set up before detection probabilities became fashionable (i.e. no repeat visits) don’t let the enormous benefits of long term (and perhaps large spatial scale) data get lost just because you can’t use detection probabilities. As we’ve seen detection probabilities are (a good method, but also a flawed method which is clearly outperformed in some cases just like every other method in statistics. They are not so perfect that they mandate throwing away good data.
The debate over detection probabilities have generated a lot more heat and smoke than light, and there are clearly some very machismo types out there, but I feel like if you read carefully between the lines and into the appendices, we have learned some things about when to use detection probabilities and when not to. The question #4 still remains a major open question just begging for a truly balanced, even-handed assessment. What do you think? Do you use detection probabilities in your work? Do you use them because you think they’re a good idea or because you fear you can’t get your paper published without them? Has your opinion changed with this blog?
*I’m aware there are other kinds of detection probabilities (e.g. distance based) and that what I’m really talking about here are hierarchical detection probabilities – I’m just trying to keep the terminology from getting too thick.
**Although I have to say I found it very ironic that the software code GLMWM provided in an appendix, which uses the R package unmarked, arguably the dominant detection probability estimation software, apparently had enough problems finding optima that they rerun each estimation problem 10 times from different starting points – a pretty sure sign that optima are not easy to find.