Is using detection probabilities a case of statistical machismo?

Back in the fall I wrote a post on statistical machismo in ecology, arguing that ecology is prone to use increasingly complex statistics without necessarily stopping to weigh the costs and the benefits. I singled out four specific techniques: phylogenetic regression, spatial regression, Bayesian methods and detection probabilities. I at no point said these techniques were bad or should never be used. But I did say that we had in many cases reached a point where the techniques had become sine qua non of publishing – reviewers wouldn’t let papers pass if these techniques weren’t applied, even if applying them was very costly and unlikely to change the results. Most of the comments were on the Bayesian (which I do NOT want to reignite here) and the two GLS (phylogenetic and spatial) regressions which lead to this follow-on post.

I got only one comment on detection probabilities. However a new paper published today in PLOS One called “Fitting and interpreting occupancy models” by Alan Welsh, David Lindenmayer and Christine Donnelly made me very excited and wanting to revisit detection probabilities.

Now, if you are based in a wildlife department you already know what detection probabilities are. Indeed, most of the committees of students I sit on in the wildlife department mention detection probabilities with a groan and a roll of their eyes but then go ahead and modify their design, at great cost – namely halving or more the amount of data they can collect – to address detection probabilities. You see, in wildlife journals detection probabilities have become a no publish line – you can’t publish without detection probabilities.

Although many in basic biology and EEB departments remain blissfully unaware of detection probabilities, the expectation is starting to creep into reviews on papers in basic research as well. As somebody who frequnetly publishes papers using the North American Breeding Bird Survey, I have now had three papers rejected a total of six times for the “sin” of not using detection probabilities (never mind that I couldn’t, didn’t need to, and it wouldn’t change the answer). So beware, this issue is coming to your population biology papers soon!

Detection probabilities are a statistic model/method designed to deal with one simple obvious fact. When you are censusing mobile organisms like birds or mammals or butterflies or … (really almost anything except plants and maybe snails), you miss organisms. You are not censusing the whole population. This has long been recognized by reporting such counts as an index of abundance rather than abundance per se (and if you need total abundance you have to use a method like mark-recpature). Detection probabilities got their start when people reported occupancy (presence/absence rates) instead of abundance. The idea was a claim of absence was pretty shaky when you’re not counting all the individuals yet the data was presented as a binary and hence large difference (presence or not). So far reasonable enough.

This paragraph has the heavy math – try to read it – but do keep going on to the paragraphs after and don’t just give up! The proposed solution was a two step model: let Oi be the true occupancy (1 if a species present, 0 if absent at site i). We can assume the true underlying occupancy rate is Ψi (a simple way of saying let Oi be distributed Bernoulli(Ψi). So far this is just a probability model of occupancy (in the simplest case where Ψ is constant across sites the occupancy rate is just Ψ). Now comes the fancy part. Let Di,j be what is actually observed at site i for observation repetition j (again D=1 if present, 0 if absent). Now Di,j can be different from Oi and we have added detection to our model (although not yet fully specified it). To fully specify we need a few things:

  1. Multiple observations of D giving the subscript j
  2. Assume Oi doesn’t change across the multiple observations j (i.e. a site doesn’t flip from really occupied to really unoccupied or vice versa between visits)
  3. Assume P(Di,j=1|Oi=0)=0 (we never mistakenly observe something at a site when it is not really there
  4. Define pi,j=P(Di,j=1|Oi=1) (i.e. the probability it is detected if it is there) – aka the detection probability
  5. Assume pi,j is constant across observations (so we can drop the subscript j giving pi)

Now you don’t need me to tell you that assumptions #2, #3, and #5 are all whoppers. But lets give the method a chance. Under these conditions it is not too hard to write down the maximum likelihood estimates (MLE) and solve for pi and Ψi which are both unobservable using the observations Di,j. It is also quite common to let pi and Ψi be functions of covariates like land cover type, elevation, etc (using logistic regression) – this more advanced model is also fairly directly solvable.

If you think about it should be obvious you cannot estimate a detection probability and an occupancy separately if you only have one observation of the site. Thus #1 (repeated observations of the same site) is critical. Right here is the nub of detection probabilities – you can only use them if you make REPEATED observations of the same data point. If you make three repeated observations, then you will in a fixed amount of time only be able to observe one third as many sites as you otherwise would have been able. This is why wildlife ecologists hate having to address detection probabilities. It has a very real cost – it is not just more computations – it is more data collection (or equivalently less independent points for all ensuing analyses). But wildlife ecologists have buckled down and done the more observations while losing power/df inherent in detection probabilities because they can’t get their paper published any other way. Now you understand the eye rolling. This also is a serious problem for people like me using historical datasets like the breeding bird survey that were never designed with detection probabilities in mind. They “only” have one observation per point in space and time.There is no way to go back and add repeated observations thus demanding detection probabilities is tantamount to throwing away historical monitoring datasets – ouch!

Well, a couple of Aussies  (the aforementioned Lindenmayer and Donnelly) were doing a nice study on the effect of monoculture pine plantations on bird communities (abundances). I don’t know the full story but judging by the paper I linked to above they must have gotten told by reviewers at least once that their paper was unpublishable and that they should: a) abandon abundances and only do occupancy so that they could b) use detection probability methods. The whole idea of throwing out abundance information and reducing it to occupancy just becaue “its more statistically proper” makes my stomach turn and apparently it did theirs too. They went out and got a clever statistician to work with them, Alan Welsh, resulting in the paper I am discussing.

It is quite a technical read, so let me boil down the main findings:

  1. In the version of detection probabilities where pi and Ψi have covariates modelled through logistic regression, the solving of the MLE equations is a lot harder than people have given it credit for. They found many cases where there were multiple solutions to the MLE (i.e. the answer depended on the initial guess you gave the solver) or where the solutions converged on the boundary (where pi and/or Ψi are either 0 or 1 which are theoretically impossible). The core issue is that there is a lot of freedom to move the solution back between high pi and low Ψi vs low pi and high Ψi , both of which would give the same observed result. When you throw in logistic regression which can have its own convergence problems when the data is very noisy, you get a mess. To be more exact, you get a lot of real-world wrong answers spit out by the computer even to the point of estimating slopes of the wrong sign.
  2. This problem is compounded when the data is sparse – i.e. either pi or Ψi is low (by which I mean say 10% which is not at all uncommon to have an occupancy of 10%).
  3. It has been fashionable recently to notice that detection probability depends on abundance (gee really – a species is more likely to be noticed=detected when there are 30 of them then when there is one of them?). But this is a major violation of #5 above (detection probability constant across sites under the very likely scenario that abundance varies across sites). There are ways to try and deal with this, but as the Welsh et al paper show, all of them have problems, leaving the detection model nearly always in violation of a core assumption of constant detection probability other than for modelled covariates.

So where does this leave us? Every model is imperfect and has assumptions violated. What are the consequences of #1, #2 and #3 for detection models? Welsh et al found  (in an extremely rigorous paper where eveyr point was supported by analysis of real-world data, analysis of simulated data and analytical results) that:

  1. Frequently the estimated relationship of detection and occupancy to covariates is very wrong. So for example in the original study which looked at how maturation of the pines influenced detection probability (old bigger forests should have lower detection probabilities) it was often estimated that detection *increased* with size/age of forest.
  2. The estimates of occupancy and detection are biased and have high variances. In fact have the same amount of bias and high variance as if you just ignored the detection probabilities and went back to the old way of doing things!!! (and this was on simulation data where detection issues were built into the data).

Bottom line – ignoring detection issues often gives misleading/wrong answers. But at exactly the same rate as if you were modelling detection which also often gives misleading/wrong answers.  When you combine this with the real world fact that often times only half or one third of the data (by which I mean independent observations) is collected that would have been collected if we ignored detection probabilities, one really starts to question the appropriateness of demanding detection probabilities.

I claimed at the start of my post that I wasn’t saying any technique wasn’t inherently bad and should never be used. And I’m not saying that about detection probabilities either.

One of the most sensible thinkers on detection probabilities I know is Steve Buckland who has been a leader in the development of detection probabilities. In chapter 3 of the edited book by myself and Anne Magurran (sorry shameless self promotion), Buckland says “Ignoring detectability might not be a major problem if the bias is consistent across time or space.” But then goes on to demonstrate quite clearly that results can be misleading if detection is ignored in other scenarios. He clearly is not black-and-white about the need to use detection probabilities. Buckland also developed a nice method where instead of repeated measures of the same site, one only needs to estimate the distance to observed individuals which can calibrate a detection decay curve. Estimating distances is not cost-free compared to just counting, but it is much less costly than repeated visits to sites and thus is a great benefit to wildlife ecologist who have to worry about detection probabilities. The distance-based detection method seems not to have made it over “the pond” to the US as well as it should have.

Here are my recommendations.

  • In light of Welsh et al’s findings it is flat out wrong for reviewers to insist that detection anaysis is a requirement for publication.
  • It is more important to address detection if you are actually studying occupancy (presence absence) and less important when you are studying other factors like community structure, abundance etc.
  • It is more important to address detection if detection probabilities are likely to vary across species (e.g. different detectabilities by species which is common enough) or space (e.g. varying amounts of brush) or survey points (e.g. varying effort levels) and that comparison (across species or sites) is what is important to you but it is less important to address detection when things are fairly constant across your axis of comparison – e.g. looking at just one species (so no issue of differing detectabilities between species) across space when there is not a reason to expect habitat to vary much (so no reason to expect varying detectabilities across space)
  • If you do have to address detection probabilities (because of your question and experimental context, hopefully, not because of reviewers), then: a) consider using Buckland et al’s distance methods, and b) consider getting serious and doing more than just two or three repetitions of each site – if you really are interested in occupancy and detection then you need real replication along that dimension just like for any other variable of interest.

I think the main theme of my post on statistical machismo is there is no such thing as  cookbook or one right way to do things in statistics. You have to know what you’re doing and think things out. Sometimes one way is appropriate. Sometimes an alternative way is appropriate. And these have to be weighed against real-world costs in data collection and loss to science of interesting studies. Detection probabilities are no exception. So if you’re a reviewer or editor, please stop telling poor authors you “have” to do detection probabilities because “its the only right way” or “gold standard” for how to do it. Its not – it very likely introduces as much error as it fixes and whether you should do it depends on the question and the data and requires thinking.

109 thoughts on “Is using detection probabilities a case of statistical machismo?

  1. Thanks for highlighting this paper, Brian, which I had not yet seen. I think this is really important stuff in so many ways — there is a real risk that our computational expertise keeps rising while our genuine field ecology expertise keeps shriveling away. I always thought it would be interesting to look at some datasets and see how often the real-world “so what” would change by using different levels of modelling sophistication. In the MacArthur era, things were pretty simple — but how many insights were plain wrong? Probably no more or fewer than now.

    The paper you highlighted makes the important point that there are trade-offs associated with increasing complexity; and that unless you really know what those trade-offs are, you don’t know if you’re better off than before. A key principle in statistics is parsimony — no more complicated than needed. What is “needed” depends — but it seems that general *methodological* parsimony is no longer desired by reviewers (that’s most of us, right?); the basic assumption seems to be that “more complicated is better”. In fact, I have had a paper rejected for being “not complicated enough” …:

    Anyway, thanks for highlighting this, and at least this should give us food for thought that — depending on the circumstances — simpler can be better.

  2. Outstanding once again Brian, many thanks for this. These kinds of posts are exactly why I frequent this site.

    I’ve long wondered whether distance-based sampling was applied in any way in estimating animal abundances. It’s a very important methodology in plant ecology of course–especially forest ecology–the subject of many analyses, both empirical and simulated. It’s been so long since I got my wildlife degree that I’ve completely lost touch with the methodological (and technological) advancements in the field. I consider this a chance to jump-start a revisit to same, and I’d be especially interested in reading the work you reference by Steve Buckland. Will check his website for posted papers.

    One question. In your specification #5 (“Assume pi,j is constant across observations (so we can drop the subscript j giving pi)”), this basic idea applies across sites (i) as well as repeated observations (j) at a single site, right? It seems that you interpreted it that way later in “main findings” #3. Or did I misinterpret something?

    Anyway, it seems to me that this is a great example of how researchers frequently attempt to make a black box out of some statistical method or another, which translates sooner or later into reviewers requiring the method unequivocally, without really understanding the types of problems you discuss here. Which is counter-productive, for sure. And aggravating.

    • Good question. Some detection models assume p (detection probability) is constant across site. But more often they put in covariates for detection probability (e.g. amount of forest). This then means you keep the i (site) but not the j (revisit) subscript. The abundance varying across sites issue that I raised in “main finding #3” is potentially managed by letting abundance be another covariate across sites i to explain detection. But its kind of circular – if you truly have a measure of abundance why are you estimating occupancy anyway? And it doesn’t work that well (in part because the relationship between detection and abundance is a non-linear saturating relationship but also for other reasons). If you’re really interested in why there is no good way to incorporate abundance in detection models I refer you to the Welsh paper who is rather thorough on this point. And as I mentioned in answer to your next question, part of the reason the models don’t converge nicely is that the same variables predict occupancy and detection.

  3. Also, would Ψi basically constitute a wildlife-habitat relations model? Or at least a presence/absence version of such? When I think of modeling occupancy as a function of various possible covariates, that’s pretty much what I picture.

    • Hi Jim – yes you have it. Ψi(variables) is a habitat relations model in the traditional wildlife sense. It lets you have cover, nesting sites, predation risk, whatever as predictors. The challenge is many of these are also predictors of detection, and that I suspect is what is ultimately driving Welsh et al’s results. Especially if you only have 2 or 3 repeat visits, there is not enough information for the statistical optimization to decide whether to attribute the effect of say, brushy cover, to explaining detection or occupancy.

      • That would seem to argue strongly for the use of wildlife cameras and similar types of remote sensing devices wouldn’t it, at least for those species that can be imaged that way. Would seem like a good compromise between the need for more data points and the expense and intrusiveness of mark-recapture?

      • Interesting point that I don’t really know the answer to. I would think cameras would have lower detection probability than humans and the Welsh paper clearly shows sparsity=low detection is a big problem. But as you say it has a clear advantage of more data. Would probably have to do a simulation to see how the two weigh out.

    • Hi Jim – A good place to look for an introduction to this would be MacKenzie et al.’s book “Occupancy Estimation and Modeling”, cited almost 1,000 times according to Google Scholar. This is very commonly used in wildlife biology; as Brian notes, almost a requirement to publish someplace like the Journal of Wildlife Management. You could find many examples of different implementations of occupancy estimation accounting for detection probability in that journal (and occasionally in Ecology, Ecological Applications, etc.).

      With respect to the abundance issue: I always had the impression (at least in the US) that this grew out of the Amphibian Research and Monitoring Initiative (ARMI), and that people sampling for amphibians had reservations about using abundance owing to explosive boom and bust population dynamics – hence a preference for occupancy over (relative) abundance as a measure of trend. I also thought that advocates of this approach generally urged against its use in situations where the species was either very common (greater than 80% occupancy) or very rare (less than 20% occupancy) – so I think a lot of people who use this would agree with Brian that it’s problematic when data is sparse.

      I’ve always been resistant to using occupancy estimation with detection probability for one of the reasons outlined by Brian: the sacrifice in power required by reducing the number of sites sampled to repeat samples at the same sites. Given the choice between getting to 100 sites vs 25 or 33 w/detection probability, I’ll opt for the 100 – at least when detection probability isn’t the question of interest. But given the popularity and ubiquity of this approach, I wouldn’t be surprised if someone showed up to defend it here (or via comments or responses to the Welsh et al. paper).

      • Thanks for mentioning the central reference for most people in day-to-day usage of detection probabilities (McKenzie).

        As you say there are good reasons to choose to study occupancy instead of abundance (and in other circumstances good reasons to choose to study abundance instead of occupancy). There is not a one-size fits all approach. And your rule of thumb of not studying organisms with extreme occupancies is a good one.

        I know enough of the history of the Welsh paper to know that it got major push back from occupancy folks during the review process even though they could identify no errors in it. I don’t know if those folks will consider this blog important enough to attack. But I am sure the Welsh paper will be attacked with blazing guns. That’s what vested interests do.

      • My only experiences with this are out of a graduate seminar almost seven years ago, so I’m sure I’m not representing the state of science well – lots of methodological papers have followed MacKenzie et al.’s book. I don’t doubt that Welsh et al. had a “fun” review process, which could be why the paper ended up at Plos One. I view one service of that journal as being a good outlet for response or rebuttal papers that are (hopefully) technically sound, but may not get a fair review at specialist or society journals owing to entrenched interests or cultural drift within particular fields.

      • That’s a good point about Plos One. It has the advantages as well as drawbacks of a “last resort” outlet. They publish a lot of very boring stuff. But they also publish some very interesting, provocative stuff that likely had a rough ride at selective journals for being too unconventional.

      • And with no space limits, you can really rebut what you want to rebut at length. In Welch et al.’s case: 21 pages, 8 figures, 5 tables! Beats the 0.5 to 1 pages that a Science/Nature/Frontiers in Ecology & Envi/etc. might permit you. At a considerable cost to visibility, but still probably useful for a detailed conversation with specialists (that isn’t buried in a supplement).

      • Not to derail this into a debate on Plos One, and I agree that it serves this outlet, Plos One is one of the journals that rejected some of my BBS papers because they were “fundamentally flawed” for not using detection probabilities – so while their decision criteria are unique, they are still subject to groupthink among reviewers too.

      • Fair enough – my perception of Plos One as a good venue for rebuttals or controversial papers is secondhand, and several acquaintances who used it for such still had a hard time publishing b/c of the first draw of Plos One reviewers (although that work ultimately was accepted).

      • Oh, I’ve been rejected from Plos One, and I have colleagues who have as well. Interesting to see how much variation there is among one’s colleagues in terms of what they consider “technically sound”. And those weren’t even controversial papers or rebuttals, they were just regular research papers.

      • Well that seems to be a good point, more localities visited should be better… but it not always so wise. Suppose you have a mean occupancy (Ψ) of 0.5 and your methods assure you a mean detectability (p) of 0.33 in each visit, if you visit 99 sites once, you’ll get about 16-17 detections, and 82-83 non detections, but you have uncertainty about which non-detections are real absences.

        If you visit 33 sites three times, then you will get the same amount of detections and non detections, but now you have improved your overall detection probability, which means that you can almost be certain of the true state of the site because the probability of missing a species after three visits is very low and and the probability of observing it at least once is very high. So now you could use occupancy model… or simply use a logistic regression with “presence” and “almost certainly absent”, and both should have (almost) the same outcome.

        In fact with any detection probability higher than 0.33 you can be pretty sure of detecting the species after three visits, or with p>0.5, two visits would be enough. So multiple visits is not only about using occupancy models, is also about improving the quality of your data. If you have a reasonable estimate of the detection probability of your species/method you could simply calculate how much sampling effort do you need to ensure detection.

        — JR

      • JR, that last part is incorrect. The probability of detecting a species in any of your surveys is 1-(1-p)^j, where p is the probability of detection during a single survey and j is the number of surveys.

        So, with p=0.33 and j=3, your probability of detecting a species is 0.70, while with p=0.5 and j=2, the probability is 0.75. Neither would allow for you to say with certainty that you correctly attributed presence/absence to all sites.

  4. With regards to using camera traps, they give you a lot of information (maybe too much?) but they do have detectability issues. Some animals don’t reliably trigger the camera – e.g. birds can be so well insulated by their feathers that there is no heat signal for the camera to pick up on. There is a delay in triggering the camera and taking a picture during which time an animal may move out of the frame. There’s also all kind of issues about some individual cameras being more sensitive and not knowing the exact area that will trigger the shot (or knowing what that area is but having it differ between individual cameras). I believe that now researchers are putting out multiple cameras focused at one site in order to better estimate detectability (wasting resources?)

    I really enjoyed this post. I have to admit that I’m very pro-detectability correction but that’s probably my wildlife science bias. For one thing, just watching the field acknowledge that there are possible variation in detectability has been great. I’m working with biologists in other fields and trying to get them to understand the very idea of varying detectability (‘oh, we know where the animals are – we go out and find them every time’) is a struggle. I don’t think it’s quite time to decrease our emphasis on detectability when it seems like more than half the working wildlife biologists haven’t even heard of it.

    In terms of wasting resources, I’m not sure I entirely agree. Sampling in order to correct for detectability seems about on the level of running replicates in an experiment. Sure, it’d be great if you could run it once and use that result but that’s not really scientifically justified. Mind you, most of my experience is in doing and re-doing point counts and the time difference between the counts is something on the order of minutes – count for 10 minutes, wait some set time, recount for 10 minutes, wait again, count again. Not too expensive in terms of resources.

    • Thanks ATM. Good to hear on the cameras from somebody who knows. One thing I agree with you on is that if you’re interested in detectability you ought to be serious about it and sample accordingly. I guess the question is what if you’re not interested in detectability (or even occupancy). I see a lot of people who do lip service by having the bare minimum two visits so they can fit a model, but they’re not really interested in it and they’ve just lost half of their independent points and they only did it because they’re scared they won’t be able to get their paper published. I don’t think reviewers should be telling people they *have* to be interested in and address dectectability. The art of science (and statistics) is judicious simplification – you can’t address all issues all of the time. What’s your take on this? I also do agree that a lot of basic ecologists could/should be more knowledgeable about detection issues.

      • “you can’t address all issues all the time.”

        Is that what a lot of “statistical machismo” is due to, do you think? People forgetting that? That is, simple classical procedure X has some known limitation or problem, someone invents some more complicated procedure to address that limitation or problem–and then forgets or deemphasizes the new problems that their more complicated procedure creates?

        Now, if the only cost to the new procedure is computational, then in many circumstances I’m on board with that. Computation is getting cheaper all the time. For instance, right now I’m working on a project to fit continuous time population dynamic models with demographic stochasticity to data sampled with error. The approach I’m using (due to mathematical ecologist Aaron King and collaborators) is very computationally intensive, but that’s the only cost. It very much makes sense in my case to pay that cost rather than, say, pretend that there’s no demographic stochasticity. Not that computational costs are always and everywhere worth paying, of course!

        Of course, as this post points out, there are many new procedures that have serious non-computational costs. It really is incumbent on all of us to recognize that there’s no free lunch in statistics, you always have to pick your poison.

      • Jeremy – I do think a lot of statistical machismo is “you can’t address all issues all the time” or tunnel vision – you get so focused on fixing one problem you forget all the others you introduce (haven’t seen the perfect statistical method yet – everything introduces issues).

        And I agree with your distinction between relatively costly vs cheap complications. To my mind phylogenetic regression and detection probabilities are highly costly (building a tree or doing repeat visits is measured in months not hours). They should be avoided probably more often than not.

        Spatial regression is more in the cheap category. You don’t need more data. And there are tools that automate spatial regression. To me the issue here is more the “monkey’s with razor blades” (or I think you call it giving neighborhood kids keys to the Ferrari) problem. Fitting a spatial model is non-trivial and requires expertise. And if you do it wrong your answers can be worse than if you ignored it. But in principle this is a solvable problem with adequate training. I don’t know Aaron King’s method but I wouldn’t be surprised if it falls in this category.

  5. Brian, I definitely agree that having people half-ass a statistical method just so they can tick off a box and get published is frustrating but I’m not sure of the solution. This seems like a common problem for a lot of different reasons – people not being particularly interested in the problem or stats or being more interested in another aspect of the research, laziness, being intimidated (and so only doing the minimum in hopes that no one will call them on it) – and I have no idea how to address it.

    Maybe having a section of papers discussing what statistical methods you considered and why you didn’t end up using them. Although that might turn into #overlyhonestmethods – We don’t know anything about detectability but we heard you need to address it before getting published and it looks like 2 is the minimum number of visits needed.

    • Re: explaining why you didn’t use certain methods, I often do that in my own papers, and not just for statistical methods, also for things like study design and choice of variables on which to collect data. If I anticipate that reviewers are likely to question something that I’ve done, I always preempt the question. To the point where I occasionally get reviewers saying “the paper is written in an overly defensive way, the author doesn’t need to justify his methods at such length”! I’d much have reviewers giving me a positive review, with a suggestion to drop some methodological justification, than giving me a negative review because I omitted methodological justification.

      • Yes, I hate the advice to avoid writing anything defending yourself or admitting the flaws in your research. Seems counterproductive and makes the paper less useful to others.

  6. I admit in the last statistical machismo section, I ignored the detection/true presence issue, since I’ve never really read anything about this issue. This post got me thinking about it, though, so I hope you’ll tolerate some ignorant questioning/comments. 🙂

    1. You talked about the assumption that pi,j is constant, conditional on site characteristics. Has there been any attempt to use something like a hierarchical model framework, where pi,j is a random effect, with residual variance even after covariates are included? It seems to me that that could allow for unknowns like abundance to increase detection probabilities

    2. With regards to the confounding issue, is there any simulations/models to test how using two different methods (human surveys + camera traps for instance) could improve on this? without running any sims myself, it seems that for a given site, occupancy is the same, but detection probabilities between the two methods would differ, so any consistence patterns in differences in detection should be attributable to differences in detectability.

    3. It seems strange that you’d have to re-sample all your sites multiple times, especially if you’re going to treat detection probability as a constant. Shouldn’t it be possible to sample, say, 90 sites, then go back and re-sample 10 of those sites and addition 3 times (or 15 sites and extra 2 times)? If the resampled sites are selected by appropriately stratification of site environmental characteristics, you might be able to at least get some sort of estimate of the plausible ranges of detection probability, without giving up all your power by sampling 40 sites three times each.

    • All good questions Eric (and that’s what the blog is for)
      1) Yes – there is a follow on book by Royle and others (the detection crowd in the US) that uses random effects and hierarchical models (see here). It is a well written book and I recommend it not just for detection probabilities but in general as a good introduction to hands on work with hierarchical models beyond the simple.

      2) There are a number of approaches where people try and cross check their detection probabilities, many of which involve parking at one site and comparing different methods. I’m not really up on the literature.

      3) I understand your point. I don’t know if there is any literature where anybody has done this or looked at the consequences of this. Anybody else out there know?

      In short – your questions have catapulted you to the frontiers of detection modelling! (at least as I understand the field – its a large, rapidly growing literature and I admit I don’t keep up on it in detail).

      I think it is great that people are getting really serious about detection probabilities and will share their findings with me. I genuinely am glad people are tackling this problem. And your questions have penetrated to the heart of what I think are some interesting questions. I just resent people who tell me I have to stop studying population dynamics of birds and start studying detection probabilities!

    • There really are hundreds (?) of papers doing different formulations of incorporating detection probability into estimating occupancy. I don’t doubt you can find cases where detection probability is estimated by comparing different sampling techniques, or where re-sampling is incomplete or only conducted for a subsample (although detection probability usually gets a universal cookbook treatment: revisit every site 2-4 times in the same season). I agree with ATM above that this work is a genuine service in forcing people to think about detection probability, but also agree with Brian that detection probably is probably regularly used as a rote rejection for papers where detection probability isn’t important or of interest.

  7. Great post. I think there’s something to be said here for the use of the robust design sampling advocated by Rota et al. in their 2009 paper (DOI: 10.1111/j.1365-2664.2009.01734.x). In it, they discuss the use of multiple surveys per site visit – a technique that was originally used in mark-recapture studies. To minimize violations of independence among same-day-same-site surveys, they use a removal protocol (though this could be relaxed for larger spatial extents IMO). This allows for two things: 1) a species’ detection probability can be estimated using fewer site visits throughout the year and 2) it allows for a formal test of the closure assumption (#2 above) by effectively fitting multiseason extinction-colonization models to a single year’s worth of data and comparing them to models where p(extinction) and p(colonization) are exactly zero.

    Your assumption #3 relating to false positives has also been addressed in a few papers, but most recently by Miller et al. ( In it, they generalize the MacKenzie and Royle-Link models to account for false positives. A benefit of their model is that it is also able to utilize multiple types of data (e.g., visual and audio), as well as information on the “reliability” of an observation (e.g., “definite” versus “questionable”). Since there’s no free lunch, of course, these models can suffer from some particularly heinous nonidentifiability issues.

  8. Having contributed to the literature on estimating detection probabilities, I’m going to jump in here – it’s a great post. I’m really on the same page as Steve Buckland – estimating detection rates might not be necessary if the bias is constant in space and time – unfortunately it often isn’t. That said, I think detection probabilities will go the same way previous trendy ideas have. First there’s a resistance to the idea itself (getting my first paper on the idea published!), then the bandwagon forms and everyone simply MUST do it (where we are now), and eventually cooler heads prevail and people use it when appropriate and don’t when not. Like many sophisticated analyses, it is not well understood by the masses yet, hence the knee jerk reaction by reviewers.

    At risk of self-aggrandizement, I’d like to address the comments that several have made about the effect of estimating detectability on power. I’m presuming that by “power” everyone here is referring to the ability to detect some sort of change in occupancy (or abundance, see below) as a function of a covariate. In a 2003 paper (Tyre et al. Ecol. Apps 13:1790-1801) we showed using simulations that ignoring false negatives reduced power, by reducing the magnitude of the effect. In later work my colleagues and I looked at how to optimize the tradeoff between sites and visits for a fixed budget when the objective was to maximize the power to detect a change across years (Field et al. 2005 J. Wildlife Mngmt 69:473-482). The answer is almost always to use >= 2 visits per site with fewer visits per site, although the exact answer depends on the occupancy probability and the detection rate, as well as how you calculate the cost of repeat visits vs. new sites. In all those cases, we weren’t looking at the tricky case of what happens when the covariate of interest for occupancy is the same as the covariate for detection. That’s just bad news, all around.

    The final thing I’d like to bring up is that if you have abundance data, you don’t have to cut it down to presence/absence data in order to estimate detectability from repeated samples. You can use Andy Royle’s N-mixture models, which replace the binomial distribution for occupancy with a Poisson or Negative Binomial distribution for abundance, and estimate changes in abundance that way.

    Statistical Machismo? No. Crappy reviewing? Yes! Surprising … not really.

    • Glad to have an expert speaking up! Again thanks for the references. I am on the committee of a student who is use Royle’s N-mixture models and he and I are working through together really understanding them (including probably some simulations of our own). Can’t say I’m far enough along to offer an opinion of how they’ll work in practice, but intellectually they’re appealing. Thanks for the references (you know sometimes the best reference on topic really unavoidably are the ones you wrote). They sound very practically useful.

      I really enjoyed your final sentence – a good summary!

      • The N-mixture models are sensitive to variation in detection rates. The estimates of abundance can be wildly wrong if they are not accommodated in the analysis (and even then, I suspect it depends on how they are accommodated, which might not be obvious a priori).

      • Thanks for the thought-provoking post post, Brian. Over the past couple years I’ve been a fairly heavy user of hierarchical models of detection. I’m far from an expert or a developer but I’ve gained some practical experience, especially with abundance models (N-mixture and the open N-mixture Dail-Madsen model). These models seem extremely sensitive to low and variable detection rates, unless there are very good covariates to explain most of the variation. They also require many sites more than many reps at each site. In many systems, it seems very difficult to have that many sites with sufficiently thorough sampling to have moderately high detection.

        The other challenge with the abundance models is overdispersion in the Poisson. Adding a random overdispersion term seems to be a decent solution, making it a Poisson-lognormal distribution. Many ecologists seem to like the negative binomial solution to overdispersion but beyond philosophical problems, I’ve found that if counts are high (even just over ~5) and there is moderate overdispersion, the huge tail of the NB results in huge abundance estimates with upper 95% CI that are FAR beyond realistic. We have counts of up to 25 (single site-observation) and using a NB gives an upper CI of ~400. The problem is people use the NB but never check the individual site CI to realize how ridiculous it is if the mean/median estimate is somewhat believable. I think people need to be very careful with how they use these models.

        Having had manuscripts rejected for not using these methods, I am in full agreement that these can be useful models both just like any other modelling or statistical tool, they are just tools to help us better describe a system. No tool is right for all jobs. Hopefully Jeremy is correct and that reviewers get over the initial excitement of these models to realize that sometimes they are the right tool, but not always.

      • Thanks for sharing your experience in the real world! It makes a lot of sense to me. The extension of detection probabilities into analysis of abundance seems especially likely to be problematic to me, which you sort of confirm.

        Yeah, people are quick to grab the negative binomial because it is easy to work with but it is almost always a bad fit to ecological data. I also have found the Poisson lognormal to be in general a great fit.

    • Drew I’m having a hard time following this statement: “The answer is almost always to use >= 2 visits per site with fewer visits per site”. You’re saying stay as close to the bare minimum of 2 visits per site as possible?

      • Hi Jim,

        Yes, if
        1) you want to maximize power to detect a change in p(occ) over time at alpha = 0.05 AND
        2) you have a fixed budget, so more sites means fewer repeats and vice versa.

        Of course, the exact optimum depends on what p(occ) and p(detect) are. Generally the lower p(detect) the more repeats to hit maximum power, but the increase is very slow. A larger budget means more power, as always.

        If you goal is different, e.g. minimize the variance in p(occ) at a point in time, then the answer is different. Darryl Mackenzie has a paper (can’t remember exact citation) where he showed the answer is then >=3.

  9. Hi Brian,

    Nice post. I’m on the same page as Drew.

    It is important to account for detection probabilities in some cases (as you point out), such as when there is variation among sites and the analysis is about site-to-site variation, or when there is variation among species and the analysis is about species-to-species variation (in occupancy or abundance).

    To be sure that we can ignore detection probabilities, we need to know how much they are likely to vary. That is a bit of a catch-22, because we rarely have that insight a priori. Because of this, there is value in studying how different factors (e.g., plant traits) influence detection, such as in this paper:

    In the absence of prior understanding and if multiple visits are too onerous, I agree that Buckland’s distance sampling methods are valuable.

    Finally, I’m going to get defensive. Your comment “It has been fashionable recently to notice that detection probability depends on abundance (gee really – a species is more likely to be noticed=detected when there are 30 of them then when there is one of them?)” undersells the intended point of my paper to which you link. In that, I showed that under specific assumptions of random encounters, there is a particular functional relationship between detection probability and abundance. Allowing clustered (rather than randomly-dispersed) detections can be accommodated by adding one extra parameter to the model. We go beyond simply saying “detectability varies with abundance” – the paper develops a model for the relationship, and evaluates that model. If people want a copy, there is a link in the following post that should auto-create an email request for the paper:



    • Hi Michael, as with Drew – nice to have some experts stopping by.

      One possible point of disagreement – If I read your comment correctly, you seem to feel that if we don’t know what the impact of detection probabilities are, then we need to measure them before moving on. I have the opposite, glass half full view, that if I’m studying something only tangentially related to detection probabilities, then the onus is on somebody who wants me to do a whole lot more work to include detection probabilities to argue why I would need to do so. Its a question of default assumptions. Defaulting to if we don’t know it could bite us and we have to stop what we’re really trying to study and study the assumption violation first is to me a formula for letting statistics drive ecology and slowing down progress. Again if something egregious is being done make that argument. But otherwise, it is not obvious to me that the default is you always have to dive into the complexity in directions you’re not interested in. If I am misrepresenting your thoughts then, apologies.

      As for your article, I can totally see why you feel defensive because I was in fact being sarcastic (which I should try to rise above in a blog post) in close proximity to my mention of your paper, but I assure you the sarcasm wasn’t directed at you or your paper. I actually read and enjoyed your paper when it came out. As you say you did a nice job of building a model that is useful. My sarcasm was really directed at those who can ignore whopping big false assumptions like detection probabilities varying between sites only as modelled by covariates (or not varying at all in some models), or more specifically the idea that varying abundances across sites might not be a big issue that was swept under the rug with many applications of detection probability models for years while sanctimoniously taking other people to task for ignoring detection probabilities in circumstances where it is probably not the most important thing going on. Kind of the don’t throw stones when you live in a glass house principle. Yes – my bitterness at certain anonymous reviewers is showing through, but it really wasn’t directed at your paper which is quite nice and is a constructive step towards improving this. Apologies for not being clear on this.

      • No, I don’t think I’m suggesting we need to measure detection probabilities before doing anything – but we do need to think about them. The hard bit is that thinking about them clearly is difficult without some related experience to guide us.

        More specifically, I think ignoring detection probabilities can sometimes be useful in habitat models of various sorts by focusing on reporting rates – the frequency of times one expects to see a species at a site. Reporting rates, I suspect, reflect species-level use of a site, and therefore correlate somewhat with importance. For example, compare a site at which a species is seen once in 10 visits to one at which a species is seen 10 times in 10 visits. Occupancy models would tend to suggest those sites are the same (they are both occupied, but they might differ in detection probabilities). However, they probably differ in important ways (e.g., either temporal use or abundance are likely to be different). Collapsing differences in these sites into the detection component of a model doesn’t seem quite right.

        While on the topic of possible disagreement, models are available to account for differences in detection rates over time, so your 5th assumption can be relaxed (admittedly by modelling it, so the model might lead the analysis astray; as Joern points out, we probably need to find a parsimonious model for the data at hand, not a prescription of do this/don’t do that in all circumstances).

        Thanks for the clarification about the sarcasm.

      • Agree with your basic point. Detection probabilities are interesting in their own right – e.g. for monitoring programs – and indirectly as a statistical challenge. Therefore I am very glad people are studying and helping us develop intuition about them. That will help us know when we need to worry about them and when we don’t which is really the main point of my post.

  10. Reply to McGill blog post
    A basic fact of a biologist’s life, very well-known to anybody who has ever been in the field, is that typically outside of the lab, studied quantities such as abundance or occurrence can only ever be measured with a systematic error. The rate of that error is called detection probability and often abbreviated p: the probability of missing something that is really there. Such negative detection error is the rule even for sessile organisms such as plants in very well-designed studies (see e.g., Chen et al., JAPPL, 2013).
    Being a “p-ignorant”, i.e., ignoring detection probability when p<1 has several undesirable consequences:
    (1) underestimation of “totals”, i.e., abundance and the extent of a distribution whenever p<1,
    (2) biased estimates of rates of change (e.g., survival, recruitment/colonisation, turnover) whenever p<1,
    (3) biased estimates of environmental covariate relationships whenever p<1 and especially when p varies along an environmental gradient.
    The last 20 years or so have seen huge advances in statistical theory and the computational implementation of population models (for distribution, abundance and rate parameters such as survival) which can accommodate detection probability and therefore mitigate these effects. However, there is no free lunch, and so these exciting benefits come at a price: for instance, replicate observations are often required during a period of closure, when the system is assumed to be static. This is costly in terms of the design of a study and may preclude application of these modern methods to historical data sets and ongoing studies (e.g., monitoring programs). In addition, the new models are more complex and so may be harder to understand at first and in addition, *may* make additional assumptions, which, when violated, can lead to biased estimates, thus defeating the very purpose of adopting them in the first place.
    Dr. McGill rides a passionate attack against some excesses in the field of population analyses that do account for imperfect detection. While I sympathise with some of what he says, I feel he throws out the baby with the bathtub. In addition, there are a number of errors, half-truths and misleading points in his original blog post, so here are some comments. I will usually focus on distribution studies, whose focus is occupancy, for the sake of keeping the argument simpler. The case for other ecological quantities such as abundance as well as rate parameters is quite analogous.
    I agree that one can overdo it when always asking authors to account for imperfect detection in analyses of abundance or distribution. It is true that for historical data (e.g., as produced by the North American BBS up to now) this may not be possible (though see the papers by Dail&Madsen, Biometrics, 2011 and the work of Lele et al on estimation in the absence of replicates). Thus, only indices to occupancy (or apparent occupancy) will be available; they are the product of true occupancy and some average p.
    Not using such index data simply because one cannot formally deal with imperfect detection or outright rejection of papers that use these data seems too extreme for my taste. I feel it would be wrong to throw out such data. Rather, one should always be aware of the potential of patterns in p to distort observed patterns in the occupancy index. One should try to carefully rule out p as a possible basis for the observed patterns and state one’s conclusions with caution, knowing that they are all conditional on the assumption that there are no patterns in p along the dimensions of comparison one is interested in. That is, reporting that the occupancy index of some species is higher in habitat A than in B is based on the assumption that the species is not simply more detectable in A than in B.
    Further, the relevance of adopting models that incorporate a component for imperfect detection may depend on the spatial scale of an analysis. I would agree that it is perhaps hard to believe that very large-scale (e.g., continental), macroecological patterns in an occupancy index are simply due to distorting effects of spatial patterns in detection error. So perhaps for very large scale studies incorporating detetion errors is less important than for more local studies ? On the other hand, we simply don’t know and there can be distinct patterns in detection probability over the scales of an entire country (see the Chen paper; with apologies for self-citation).
    However, here is a list of things were I strongly disagree with McGill or where I would claim he is wrong:
    (1) His definition of detection probability (start of 4th para) is completely wrong: “Detection probabilities are a statistic model/method designed to deal with one simple obvious fact”. In truth, it’s the probability to observe a thing, e.g., an individual or a species at a site and place, given that it is there. We can understand what he means, but the fact that the basics are wrong is a bad start already.
    (2) It is also wrong that p is only a problem with mobile organisms as he claims (para 4). It has been shown to be a challenge for plants many times now.
    (3) Further, it is wrong to say (para 4) that thinking about p started with the increased use of occupancy as a state variable over abundance. The problem of imperfect observability of abundance is exactly analogous and actually more acute in many ways. Whether we think about p (and decide to do something about it or not) has nothing to do with the state variable of a study, e.g., occupancy or abundance.
    (4) In para 5, he gives a decent description of the simplest, basic occupancy model of McKenzie et al (2002) and Tyre et al. (2003). This time, he gets the definition of p right 😉 In the following para, he ridiculises the assumptions of the model: closure (#2), absence of false positives (#3) and homogeneous p (#5). I would argue that first, violations of some of these (e.g., closure) are not disastrous (they simply result in a different interpretation of the model parameters), they can be relaxed in more advanced field designs or models and especially, any analysis of the distribution of a species is implicitly based at least on closure and absence of false positives. Thus, McGill illustrates well the fact that if an assumption is not stated explicitly then that does not mean it is not there.
    (7) We can understand what McGill means (para 7) when he says “losing power/df inherent in detection probabilities”, but it is nevertheless wrong as it is written. Detection probability underlies *any* observation made in the field and is thus there, whether one likes it or not and whether we account for it in our models or not. It’s the study design and/or analysis accommodating p which loses power (but gains robustness).
    (8) Also, in the same para, McGill confuses again the relationships between abundance, occupancy and p: p has nothing to do with the choice of state variable to analyse, it is relevant for both abundance and occupancy. I agree that reducing the information content in one’s data by downgrading counts to occupancy data is usually be a bad idea. However, sometimes this may be OK, because the assumptions of occupancy models that accommodate p are easier to meet than those for abundance models (e.g., the closure assumption is easier to meet for occupancy than for abundance).
    (9) In para 8, he summarises some of the findings of the Welsh et al. paper (which I haven’t read yet, but will do soon). I believe it is wrong to single out the fitting of occupancy models as having numerical problems. Similar things happen to state-space models, multistate models, and no doubt a host of other, advanced models where boundary estimates and local minima may be a very real challenge. Thus, yes, fitting occupancy models may require some care. In this respect it is useful to remember that much can be learnt about the quality of the estimates of a model for one’s data set by conducting even very simple simulations. With software such as R, this is often a trivial exercise.
    (10) In point 3 of para 8, he pokes fun of people who observe that p depends on abundance. But this criticism falls back immediately on the p-ignorants, because the resulting bias (e.g., occurrence more often observed in the centre of a range, where abundance may be greater) is built into their analyses. Wrt to occupancy models, there are ways to acount for detection heterogeneity (e.g., Royle-Nichols, Ecology, 2003; Royle, Biometrics, 2006); though, admittedly, they may not be without problems (c.f. Link, Biometrics, 2003).
    (11) In para 10 (his bottom line), McGill seems to generalise the results from a single study to all of ecology etc. In addition, in spite of what he claims, McGill effectively seems to suggest that methods (field design and analysis) that estimate p should not be used because they are a waste of time and effort. I don’t think that either is wise.
    (12) Para 11 simply shows that McGill has not followed the population estimation literature of the past 20-30 years at all. Distance sampling has been one of the major protocols/methods for estimating abundance, with an easy-to-use, free and Windows-implemented software called Distance, a huge user group, books, workships etc.
    (13) Finally, about his recommendations at the end:
    (a) I agree with the first, though I would qualify it that if you have the data to estimate p and you don’t, then I would reject a paper. Also, if you start a new survey and don’t at least spend some resources into the assessment of the measurement error (i.e., p), then that is irresponsible. I quite disagree with the 2nd recommendation, that correction for p is more important for studies of occupancy than for studies of functions of occupancy such as community strucure.
    (b) I strongly disagree with the claim that p is less important for studies of abundance: if anything, accommodating p is more important for abundance than for occupancy. It is important but far from widely understood that occupancy is simply an information-reduced summary of abundance: occupancy is simply the probabilty that local abundance is greater than 0. Hence, it is wrong to think of abundance and occupancy as separate things.
    (c) I partly agree with his 3rd suggestion, but would simply point out that often you don’t know along which dimension of comparison in a study p may vary. Hence, as an insurance, I would try to estimate it whenever I can.

    My own summary about the issue of p would be this:
    (1) An observation process underlies all ecological field data (abundance, occupancy, rates) and one of its key components is imperfect detection, quantified by detection probability p.
    (2) p<1 can seriously distort your inferences from observed data under p-ignorant analyses.
    (3) p can be estimated using additional information, e.g., using distance sampling, replicated counts or capture-recapture, and your inferences about abundance/occupancy/community structure can be robustified.
    (4) If you need absolute quantities (abundance, occurrence), there is no way around estimating p. You cannot then simply use an index that bears no known relationship with these. Simply using the index, even if at first it may be acknowledged to be an index and not the real thing, but later interpreting it as if it were the real thing, is wrong and dishonest. I have seen many people doing this.
    (5) If you are really and truly only concerned about patterns, then a p-ignorant analysis may do, unless you can do better. I do not find it acceptable to present an analysis without p when you could have done otherwise (e.g., when you have replucate observeations). Also, I find it irresponsible to design new studies or monitoring programs that do not at least use some resources to keep an eye on p.
    (6) If you have old historical data, go ahead and analyse them, but carefully discuss the potential biases incurred by observation errors. And keep in mind that Tukey quote: The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
    I personally believe that it is true that excessive worries about the observation process underlying all our data, and p, may lead to one to lose sight of the scientific big picture. Usually we should only put so much lipstick on a pig as is required by the scientific question or the management objective at hand. However, at the same time, I am afraid that there are still way too many scientists in ecology and evolution who want to write poems without getting the grammar right.

    Marc Kery, 15 January 2013

    • A rather different tone than Michael and Drew brought to the blog.

      Let me see if I can summarize and understand the main points.

      1. You will ignore the fact my post is space limited and an informal blog focused on popularizing ideas to many non-statisticians and be very pedantic about definitions and citations to prove I am ignorant (and more broadly attack the “p-ignorants”)
      2. You will have hidden points #4-#6 to exemplify the importance of detection probabilities 😉
      3. You have not read the paper that was the center point of my blog but disagree with it
      4. You basically agree with many of my main summary points albeit wanting a different spin on them but still need to use the word p-ignorant four times (presumably about me as well as my readers?)
      5. You are profoundly concerned about people who make the assumption of no detection error, but completely comfortable with making major assumptions to enable the study of detection error. And they are major assumptions. You bring up closure. This can indeed be as I called it a “whopper”. For example, studying stop-over points on bird migration routes is so unclosed as to completely invalidate anything that invokes it. That’s an extreme example, but its a slippery slope and there are shades of grey in other systems. Butterflies visited every two weeks? Amphibians in the period they are maturing and leaving their natal pond?
      6. You completely blow by my main point that in a finite world there are trade-offs and a fundamental trade-off in science is between effort to collect data and accuracy of data, and specifically that there is a cost to address detection probabilities which you refuse to weigh on a balance by asserting everything that ignores detection probabilities is wrong, without ever getting into a discussion about “how wrong” things are to weigh against “how costly” additional data is. Your obsession about detection probabilities in plants is a case in point. While I acknowledge detection probabilities are a real issue with tiny rare plants, the amount of error due to detection probabilities in trees is trivial compared to other types of error like misidentification and clerical error. And I’m not going to repeat my whole post, but there are many scenarios outside trees where the cost of detection probabilities can outweigh the improvements in accuracy.

      The overall sense I get is that you are very comfortable invoking the mantra “it is wrong” if you ignore detection probabilities and are unwilling to address the more nuanced, complex world I was addressing. I also get the sense that you believe there is only one right way to do science and that anybody who disagrees with you is “p-ignorant”.

      Myself, I believe all of science is a carefully weighed set of trade-offs and that the field as whole benefits from a diversity of approaches.

      You are welcome to engage the issues I raised in a more substantive, less ad hominem way, but I cannot guarantee free airspace to another vent.

  11. Hi Brian, nice summary. The paper itself raises valid concerns, but I don’t think the world is that black and white. Thinking has been happening regardless of the flaws of the review process, as shown by some papers which often propose solutions to some of the problems (e.g. using single visit data to correct for detection error, using penalized or conditional likelihood approaches to minimize bias and stabilize numerical results). Here are some, but there are certainly others I missed:

    Johnson 2008, In Defense of Indices: The Case of Bird Surveys. The Journal of Wildlife Management 72:857–868.

    Moreno & Lele 2010, Improved estimation of site occupancy using penalized likelihood. Ecology 91:341–346.

    Efford & Dawson 2012, Occupancy in continuous habitat. Ecosphere 3(4):32.

    Lele, Moreno & Bayne 2012, Dealing with detection error in site occupancy surveys: What can we do with a single survey? Journal of Plant Ecology 5:22–31.

    Sólymos, Lele & Bayne 2012, Conditional likelihood approach for analyzing single visit abundance survey data in the presence of zero inflation and detection error. Environmetrics 23:197–205.

    • Thanks for the great references!

      As your references show (and Drew and Michael’s earlier), I think that many people at the frontiers of the field are aware of the limitations and thinking hard about it, and doing great work. I know a couple of the references you suggested and look forward to reading the others. In particular I know some of Lele et al’s work about using covariates to estimate detection probabilities rather than repeated visits, but unless I’m missing something it doesn’t work in the case where the same covariates are driving detection and occupancy?

      I agree with the world not-being black-and-white. My main point – which I think is orthogonal to agreeing with you about thinking people doing good work – is sometimes telling people to do a lot of extra sampling work and/or complexifying their analyses to estimate detection probabilities is not the right thing to tell people. Indeed it is deeply detrimental to the progress of science. There is no black-and-white, one-size-fits-all, always must do approach in statistics. And that emphatically includes detection probabilities. That is true whether it is telling people during the peer review process or during their experimental design.

      It is an interesting question whether the guilty party of “always telling people they *have* to do detection probabilities” are more a subset of the highly knowledgeable people pushing the topic forward or instead, as Drew suggested, the people who don’t really understand and are just riding the wave of a trend. In my experience you see some of both.

      Since you cited a paper that uses ZIP (or more generally zero-inflation models), I find these much more intuitive and also easier to use (and teach) than detection models. To me they are doing something subtly different though – saying the processes behind occupancy and abundance are distinct. Do you see it differently? Definitely a paper I will have to read.


      • Your comment on ZIP models highlight the biological vs. statistical nature of occupancy and abundance modeling. Occupancy, as Marc also pointed out, can be seen as censored count/abundance data and as such it is just a less informative approximation to the same underlying process. Coming from a community ecological or macroecological perspective, occupancy aka 0/1 data has a different interpretation, often in relation to species ranges. ZIP models can to some extent differentiate between these processes, but the scales are not implicitly considered by them — that must be introduced e.g. by covariates.

        Our Binomial-ZIP model in Environmetrics exploits the fact that detection error is easiest to quantify based on the non-zero counts (e.g. you observe 4 out of 6 — you know that true abundance is >0; this is step 1). Once you figured out the abundance and detection components, you can deal with zero inflation in abundance much easier in step 2.

        You also was correct about pointing out that single visit methods have their specific conditions (covariates exist for abundance and detection, which do not completely overlap), but are free from the more prohibitive closed population assumption.

      • Thanks for the quick answers to my questions – very helpful. And I quite agree with you about abundance vs. occupancy. Occupancy has never been my favorite macroecological variable. But I agree with you that it has been heavily studied. I’ve focused a lot on variation in relative abundance across space with 0 just being a number fairly close to 1. Probably one of the reasons I’ve never invested huge effort in detection probabilities (being OK with relative instead of absolute abundance is also part of the reason). A great point.

  12. Dear Brian,

    first of all, I admit that that the “p-ignorants” in my original post were rude and must have made me come across as unnecessarily confrontational and engaging in a personal, rather than in a substantive way. That was not my intention, so please accept my apologies for this.

    Nevertheless, I would like to emphasize that I do believe there is a lot of substantive content in my first post. You can’t simply put down as “ad hominem” attacks the pointing out of inaccuracies and errors.

    Then, here are my replies to the points raised by you.
    (A)I would argue that it is *exactly* in a blog directed to many non-statisticians where one has to be very careful about basic concepts and definitions. I don’t think your way of defining the very focus of your original post (detection probability), the relationships between occurrence/distribution and abundance, and the modeling and estimation of abundance by distance sampling versus other estimation approaches, to name but a few, is at all adequate for your audience. Informality does not relieve one from some minimal standards of technical accuracy.
    (B) Of course, I am a little embarrassed that my points 4 to 6 got lost :).
    (C) Clearly, your blog entry is directed at many people who have not yet read the Welsh et al. paper. So your original post is stand-alone enough to be criticised without reading the Welsh et al. paper first.
    (D) Yes and no: I agree with a couple of things you say. In particular, that I would definitely not sink a manuscript as a referee if it reports analyses that don’t estimate parameters for detection probability if that cannot be done using standard methods, e.g., in BBS analyses (though see some of the papers cited by Peter Solymos). However, I read your original blog as claiming that p estimation is a waste of time and clearly, I strongly disagree with that.
    (E) You hit on the assumption of closure which is often made in a formal estimation framework. Closure can be an inadequate assumption and its violation can lead to bad bias, true, but closure is always relative: it depends on the length of the observation window relative to the dynamics of a system. Whether it’s OK to assume it or not depends entirely on the system, the sampling protocol and the model. No black and white answers, as Peter reminds us. And no standard analyses either, ever.
    (F) Of course there are trade-offs all around us. But one of my main points is that if you *can* keep an eye on p and simply chose not to, then I find that inacceptable, even if you are mostly interested in relative patterns and not in absolute totals (in which case you must always estimate p — except when the RMSE for a model with p is greater than for a model without — have to read that Welsh et al paper).
    (F1 ) Then, you call me obsessed with p in plants. Come on, I’ve (co-)authored about one plant paper on p every 2 years, so this claim is clearly a little exaggerated. However, I like the plant examples because in my former life as a botanist I was always surprised at how confident we plant ecologists were about our belief that p was no problem for us. And, like you, many ecologists believe that p is only a problem for mobile organisms or perhaps for “tiny rare plants”. This belief is simply not borne out by the data. The Chen et al. paper (JAPPL 2013) contains the largest and most rigorous assessment of the p=1 hypothesis for plants ever, and, moreover, for a taxonomic random sample of 25 species for each of the following groups: grasses, forbs, shrubs and trees. The average p of trees was 0.88 and the median over all 100 studies species was about 0.8. And this must be a best-case scenario, because the survey is highly standardised and conducted by trained and paid botanists. So you simply can’t go on claiming that p is only a problem (I translate, <1) for mobile organisms and “tiny rare plants”.

    I agree with your views on trade-offs and a diversity of approaches in general. But I feel that there frequently is too much complacency towards the possibility of detection errors (and much more so in ecology than in wildlife departments). Nevertheless, I quite like Drew Tyre’s point about the way that many novelties go: from rejection to excessive embrace and then ultimately towards a balanced view about when they matter and when they don’t.

    Perhaps we need a kind of model-selection view of detection probability: we should treat it similarly to a potentially important covariate: include it in the model when it matters. In ecology, this would mean that we state our objectives first (e.g., are we interested in totals or in mere patterns ?). However, to do such a model selection, we need at least some information about p. And that means that we always ought to collect some information on it, if we have the chance. Only then are we able to test our potentially crucial assumptions about p, and, in cases where they are not met (e.g., p<1), we still have the ability to test our original hypotheses.

    Finally, I would like to point out that there are plenty of training courses in a large array of population analysis settings, see, e.g.,

    Kind regards — Marc Kéry

    • Marc – thanks for toning it down.

      I’m just not sure there is much more useful to say. Where we differ is on opinions, and I rather doubt either of us is going to convince the other!

      You are much more comfortable with the assumptions that underlie detection probability models than the assumptions that let you ignore it. I am more often than not the opposite. This is mostly a judgement call as the answer is so context dependent it is hard to have general results.

      You believe that any new data collected that is counting or looking at the distribution of organisms must look at detection probabilities even though this comes at a cost. I consider this an unproven and not well-supported position. And I have a hard time finding in your arguments more than a “just because”. If you really wanted to convince me, give me the long list of things we thought we knew ecologically (i.e. major patterns and mechanisms – not just a refinement in the estimate of a population size) that then turned out to be wrong once we went back and looked at them with detection probabilities.

  13. Benedikt Schmidt ( has been having trouble posting a comment (if anybody else is having this trouble please let Jeremy or I know). So I am posting this for him here:

    Marc wrote “Perhaps we need a kind of model-selection view of detection probability: we should treat it similarly to a potentially important covariate: include it in the model when it matters.”

    There’s a nice paper that deals with this issue:

    Title: How should detection probability be incorporated into estimates of relative abundance?
    Author(s): MacKenzie, DI; Kendall, WL
    Source: ECOLOGY Volume: 83 Issue: 9 Pages: 2387-2393 DOI: 10.2307/3071800 Published: SEP 2002

    Brian wrote “If you really wanted to convince me, give me the long list of things we thought we knew ecologically (i.e. major patterns and mechanisms – not just a refinement in the estimate of a population size) that then turned out to be wrong once we went back and looked at them with detection probabilities.”

    There are two papers that I know of that address this issue:

    Title: Landscape characteristics influence pond occupancy by frogs after accounting for detectability
    Author(s): Mazerolle, MJ; Desrochers, A; Rochefort, L
    Source: ECOLOGICAL APPLICATIONS Volume: 15 Issue: 3 Pages: 824-834 DOI: 10.1890/04-0502 Published: JUN 2005

    Title: Absent or undetected? Effects of non-detection of species occurrence on wildlife-habitat models
    Author(s): Gu, WD; Swihart, RK
    Source: BIOLOGICAL CONSERVATION Volume: 116 Issue: 2 Pages: 195-203 DOI: 10.1016/S0006-3207(03)00190-3 Published: APR 2004

    Best wishes,
    (I also publish “fashionable” papers on detection probability:

    • Hi Benedikt – thanks for the references! I will have to look at the Mackenzie & Kendall paper. Of course the whole problem with a model selection approach to detection probabilities is that you still have to collect the extra data which is my concern to begin with …

      I took a llook at the two papers you provide in response to my challenge. Both papers show that if you have a logistic regression model of occupancy vs a covariate, the estimate (i.e. slope) of the regression is biased and less efficient (more variance). This all makes sense. The Gu & Swihart paper has some nice simulations that measure this effect. My read is that the bias is not large when the detection error is random relative to the covariate of the model – they only get really big when the detection errors are correlated with the covariate you are trying to use to explain occupancy which seems intuitive. My understanding is that most detection models also cannot handle the scenario where one covariate explains both occupancy and detection (aka an identifiability problem). Am I wrong on this?

      In any case, these effects are real. Anybody in this scenario needs to worry about it. But it hardly makes me think we’ve gotten basic patterns or mechanisms wrong by ignoring detection to such a degree that we now must always pay the cost to collect detection data – especially when not working on occupancy.

      And as I already mentioned to Michael, my apologies for the sarcasm in proximity to the mention of your papers. It was not intended to be directed at your papers which are both very nice and constructive in helping us understand how and where detection probabilities vary (which lets us know when we need to worry about them or not).

      • And a response from Benedikt who is still having trouble wiith WordPress (and I’ll just add thanks for the quick and informative response:

        Yes, with the MacKenzie & Kendall approach you must estimate detection probabilities. I think they start from the viewpoint that there is observation error which can cause bias. It would then be the duty of researchers to show that their particular data set is unaffected by observation error.

        In occupancy and Nmix models, a covariate can affect both detection and occupancy/abundance. Marc Kery did some simulations to show that:

        Title: Estimating abundance from bird counts: Binomial mixture models uncover complex covariate relationships

        Author(s): Kery, Marc

        Source: AUK Volume: 125 Issue: 2 Pages: 336-345 DOI: 10.1525/auk.2008.06185 Published: APR 2008

        There are similar simulation results in Marc Kery’s WinBUGS book.

        My “long list long list of things we thought we knew ecologically” was rather short. There’s another area where some (but not all) folks have argued that observation errors matters: density dependence in population time series. For example,

        Title: Sampling-variance effects on detecting density dependence from temporal trends in natural populations
        Author(s): Shenk, TM; White, GC; Burnham, KP
        Source: ECOLOGICAL MONOGRAPHS Volume: 68 Issue: 3 Pages: 445-463
        DOI: 10.1890/0012-9615(1998)068[0445:SVEODD]2.0.CO;2 Published: AUG 1998

        It’s probably fair to say that other wrote that observation error does not always matter, e.g.
        Title: Density estimation in wildlife surveys
        Author(s): Bart, J; Droege, S; Geissler, P; et al.
        Source: WILDLIFE SOCIETY BULLETIN Volume: 32 Issue: 4 Pages: 1242-1247 DOI: 10.2193/0091-7648(2004)032[1242:DEIWS]2.0.CO;2
        Published: WIN 2004
        Bart et al argue that observation error is not a big problem as long as there is no trend in detection probability.

  14. A nice example where interpretations are supported/modified by a re-analysis which incorporates detection error can be found here:

    Dorazio, R. M., N. J. Gotelli, and A. M. Ellison. 2011. Modern methods of estimating biodiversity from presence-absence surveys. Pages 277-302 in G. Venora, O. Grillo, and J. Lopez-Pujol, editors. Biodiversity Loss in a Changing Planet. InTech, Rijeka, Croatia.
    ISBN: 978-953-307-1427-1

    This book chapter is freely available with an internet search.

    The original Ecology paper (Gotelli & Ellison 2002) examined species richness of ants in bogs and forests of New England, and used linear regression to estimate associations between richness and environmental covariates. The re-analysis accounts for detection probability while treating species as a random effect in a multilevel model. Nevertheless, they find some interpretations from the original analysis held up while others did not.

    I think this can serve both Brian’s and Marc’s arguments: detection may or may not be important and blindly incorporating/ignoring it without understanding the associated costs and assumptions is a poor choice.

  15. Pingback: Productively stupid | Ecologically Orientated

  16. Hi All,
    Someone sent me a link to this discussion and after a quick skim read of the comments there’s a couple of points I think I can contribute to.

    Closure: People often get hung up on this and claim it’s not realistic, but importantly it’s not just a statistical assumption, it’s vital for the biological interpretation of what you really mean when you say a species is present at a location (similarly if you’re saying there are X individuals there when talking in terms of abundance). People are always making implicit assumptions about closure, whether they are worried about detection or not. Without it, you’ve got no basis for extrapolating about whether a species is present beyond the period of a survey. For example, if you conduct a 5-minute bird point count and detect a certain species and want to interpret that as evidence the species is present at that location during the breeding season (or other time period), you are assuming the location is closed to changes in occupancy for the remainder of the season. If you are unwilling to assume closure, then you have no basis for saying whether the species is present or not outside of the 5-minute survey; by the time you get back to your truck the species may or may not be present at the location, in which case what value was there in doing the survey in the first place? Like Marc said, when the full closure assumption isn’t being met (and often it won’t be), then that just induces a changes the interpretation of the parameters. Again, people have been doing this for years without thinking about detection probabilities, e.g., species is always present at a location vs sometimes present (ie use).

    Loss of information on occupancy due to repeat surveys: Quantity doesn’t trump quality. As soon as you admit you have p<1, you actually get a more precise estimate (ie smaller standard error) by going to fewer places with more surveys. This has been shown a few times in the literature now with simulation and analytic results (e.g, Drew mentioned the Field et al. paper, but also MacKenzie and Royle. 2005. Designing occupancy studies: general advice and allocation of effort. J. Ap. Ecol, and we covered it our book that someone mentioned earlier). This doesn't only hold for none vs some repeat surveys, but for the number of repeat surveys. For example, a number of years ago I was corresponding with some folks working on tiger in SE Asia and they were surveying 200 places twice and getting a standard error of 0.11 for occupancy and I was pointing out to them that because their detection probability was so low they would actually get a much more precise answer (SE=0.07) by sampling 80 places 5 times. The same holds if you're interested in covariate relationships. You can think of detection as a form of measurement error; you're not gong to be able to reliably identify a signal to the measurement error is too great. One way of reducing that error is by having repeat surveys and the tradeoff of then being able to go to fewer places isn't a bad one becuase you've got more quality data for the same amount of resources. I'd also note that poor quality, or insufficient, data is what often leads to some of the issues noted by Welsh et al. I like to remind people that these methods are statistical, not magical.

    Brian, I also want to come back to your original #5, you don't have to assume that detection is constant in time. You could use a covariate to model some variation, or estimate it directly if you have suitable data. We showed this early on with some of our original papers on the topic.

    Do I think we always need to account for detection probabilities in an analysis? No. If we've got good quality data then there probably won't be a lot of difference in our conclusions whether we account for it or not. Should we consider it during the design phase of a study? Most definitely. Given it's a practicality of the field methods that things are not always gong to be found, and it's known that it can result in misleading conclusions if ignored, you're going to need some pretty strong justifications for ignoring it completely and not trying to deal with it via collecting appropriate data and/or methods of analysis. By ignoring detection your inferences are actually more assumption laden than by using statistucal methods that explicitly incorporate it because typically you are making many of the same underlying assumptions, only implicitly, with the additional (untestable) assumption that there's no systematic detevtion issue.


    • Hi Darryl – thanks for stopping by. At this point we’ve had a who’s-who of detection probabilities stop by. And for the most part a very civil tone despite a very challenging post – I have to conclude detection probabilists are very polite!

      On your two points.

      #1 and closure – but what about the scenario where I really am most interested in one point in time (e.g. the classic definition of a community is all individuals present at a point in time). While I do have a detection problem of some undetermined size, I don’t have a closure assumption in this case, until I start doing repeated surveys to measure detection, at which point your logic kicks in.

      #2 and data quality – most importantly, what if I’m interested in a multiple explanatory variables predicting occupancy (or abundance), then cutting the number of observations in half greatly reduces my degrees of freedom, and covering a smaller range of conditions can indeed be costly.

      #2 and data quality – even in the simple case where, for example, I am only interested in estimating occupancy, my error in estimation goes down with increasing number of points,n, and also goes down with the error in each observation, σ. This latter is what detection error addresses, but it comes at a cost in n. Not remembering the formula for binomial off the top of my head, assuming we are continuous standard error is σ/√n. Thus everytime I cut n in half, σ has to improve by at least a factor of 1.4 to actually realize an improvement in accuracy of estimate. I’m sure (or at least I hope) there are more sophisticated and exact models for detection probability trade-offs (and would welcome a reference), but to a first approximation this should be pretty close. Thus its not true that improving my quality of estimate at the expense of number of points is always a win. It would require a careful analysis to know which side of the trade-off you were on.

      To my mind this is exactly what the Welsh et al paper delivered. And the result was that the trade-off really didn’t look particularly worth it (i.e. no improvement in estimation accuracy of slopes in a logistic regression). There IS a tradeoff that needs to be rigorously addressed. One cannot just blow by a carefully done analysis that shows detection probabilities coming out on the wrong side of the tradeoff just by assertion that its obvious its better with detection probabilities – this is an empirical question. I would really appreciate it if the experts that drop by could address the Welsh paper that inspired my post.

      • We like to think of ourselves as very civil people. 😉

        Please define your point in time. How long is that? Do you really mean it’s instantaneous across the entire region of interest? Can you really sample that quickly? Does the result you come up with only apply to that instantaneous point, or are you going to assume that’s inductive of what the community is like over a longer period, like a week? As soon as you say that you’re assuming your results are indicative over some period of time that is longer than your actual survey, I’d argue very strongly that you’re assuming the system is (relatively) static, ie making an assumption of closure. This is regardless of how many surveys you’re actually doing.

        Doesn’t matter if you’re interested in multiple explanatory variables, going to lots of places is not inherently better because if you have a greater chance of false absences, leading to less reliable results.

        The number of sampling units ultimately determines precision of the occupancy estimate, but the variance term involves a component associated with imperfect detection. If that component is too large it swamps the benefits of a larger sample size. Check out the MacKenzie and Royle paper or our book for more details. Again, quantity doesn’t always trump quality.

        Haven’t had a chance to read Welsh et al, yet, but will endeavour to do so soon. The comments I’m making though are based on first principles and experience from thinking and chatting with folks about a lot of this stuff for a large portion of the last 12 years. I’m simply responding to the comments that have been made here, comments that are similar to ones I’ve been hearing for a number of years.

      • As in my other reply, yes, sampling at one point in time is a fiction too, but often times not a bad approximation. As for the rest, I’m not sure we’re disagreeing there is a trade-off between quality of points (size of error) and number of points. Thanks for the reference for a more detection probability-specific model.

    • And I wonder if you can expand on this quote at the end:

      By ignoring detection your inferences are actually more assumption laden than by using statistucal methods that explicitly incorporate it because typically you are making many of the same underlying assumptions, only implicitly, with the additional (untestable) assumption that there’s no systematic detevtion issue.

      It sounds good. But if I parse it you have either: a) multiple assumptions that are explicit with detection probabilities incorporated; or b) you have one assumption (detection error is small enough or constant enough to ignore) when you ignore them (which in this modern context of having to justify ignoring detection probabilities is also an explicit assumption). How exactly is it obvious that a is better than b in any general scenario? I think we can all agree that the assumptions behind a and b are both wrong (as statistical assumptions usually are). It is a question of the degree of wrongness and the consequences of wrongness vs the costs and weighing them out. I repeat this is an empirical question. Blow me away with the awe inspiring evidence of wrongness under b vs awe inspiring rightness under a. It seems to me Welsh et al did address this empirically and very rigorously (at least for one specific but extremely common scenario) and they found that the errors induced by a roughly equals errors induced by b (and humblingly, the errors were pretty large under either a or b). When you add in that (a) is much more costly than (b), then the balance seems to tip to (b)?

      What am I missing? Or what did Welsh et al miss?

      This is a statistical question. I’m really looking for the data, the numbers and the calculations here …

      And again, I’m not trying to turn this into black and white that detection probabilities are always wrong and useless – just that like any other statistical technique they have limits and not universally applicable – i.e. trying to put an appropriate level of gray back into what I think has gone too much in one direction.

      • You miss my point. By taking approach B you will also often be making many, if not all, of the same assumptions required by A, just that they will be implicit and probably wouldn’t have even occurred to the analyst.

        I’ll take a read of Welsh et al and get back to you.

      • Ok – I understand your point, although as I argued closure is not really an assumption a lot of times under b (notwithstanding the practical reality that one cannot sample simultaneously). A also contains the assumption that detection is by far the biggest source of error to manage and thus is most deserving of singling out.

  17. Ok, curiosity got the better of my work ethic and I’ve taken a look at Welsh et al. While they raise a number of points I agree with wholeheartedly, I wouldn’t at all be surprised that this paper had a hard time in the review process as I think there are a number of concrete issues with it dispite what you commented earlier. In the interests of openness, in case you’re wondering, I don’t recall ever reviewing this paper (I may have, but forgotten) and if I had I would have made the same comments I’m making below to the authors. There’s also a number of points in the paper that you don’t appear to have picked up on based upon your comments as well. I’d also note that I’m not familar with the vglm routine in R and how it compares to software that has been specifically coded for this types of models (eg PRESENCE and MARK).

    Aside from the glaringly obvious problem that the figures don’t match up correctly, here’s some highlights to consider.
    1. At no point in our book do we suggest occupancy should replace abundance. We see them as different state variables each with various pros and cons.
    2. The variablilty in the relationships noted for the real data may be spurious; they don’t appear to have tried models without the time since planting covariate. If there really is no relationship, of course you’re going to get variable relationships.
    3. In the simulations they do exactly that. Generate data where occupancy is constant, but fit models with the covariate. Of course they are going to get highly variable results, there’s no consistent signal to model. I bet you get very similar results if you did the same sort of thing for simple linear regression. This is a very major issue as it is the basis for many of their arguments.
    4. When they ignore detection, they collapse the results of the 2 simulated surveys into a single observation and use normal logistic regression. In doing so, they are still assuming closure across the period of the 2 surveys, even though they now still have a single survey for the analysis.
    5. Its not a fair comparison between the variability in occupancy relationships without and without detection because in the former case there’s no signal in the data (as there is no relationship), but in the latter there is, but it comes from the detection process.
    6. While more variable, the estimated relationship between occupancy and the covariate is unbiased when accounting for detection, but it’s biased when the data is collapsed and ignored.

    Apologies for any spelling errors, Firefox has recently updated itself and now isn’t automatically spelling checking…


    • Hi Darryl thanks for taking time out to to take the paper seriously. My thoughts:

      1) I am sure you never have recommended abandoning abundance data for occupancy. But the fact remains that the authors of the Welsh paper were clearly told they must do it to publish. And I have been told this on two different papers. So while it is clearly unfair to hold the sensible leaders of the detection movement culpable for the nonsensical actions of their followers, the detection movement as a whole is culpable of doing some pretty bad reviewing and blocking good science on totally unfounded grounds.
      4) Yes they are assuming closure since the visits were I think 2 weeks apart. But given the site fidelity and low mortality of birds during breeding season, I’m pretty sure this is not a harmful assumption. But your general point that ignoring detection probabilities often makes a closure assumption is true, but of course my point that they are only doing repeat visits and invoking closure is because they felt they had to deal with detection is also probably true. Bit of a circle on this argument, and not really sure there is more to say.
      2,3,5) Not sure I understand these arguments. Yes they have β (the coefficient of occupancy vs stand age as zero in the simulations. Not sure how this changes anything. Estimators of the slope are unbiased meaning loosely they should estimate the true value on average (in this case zero) and the variance of the estimators is not a function of the true β. I agree that putting the slope at zero makes it easier for the sign to flip between estimates which makes things look worse and this fact should be ignored (but it makes it convenient for the authors to calculate bias which they do). The Welsh et al paper’s main claim is that ignoring detection probabilities is slightly more biased but lower variance than when detection probabilties are incorporated. Under traditional GLS both bias and variance of estimates are independent of the true slope so choosing zero is arbitrary and irrelevant to the result. Unless there is something really subtle with detection models introducing dependence on true slopes that I am missing?
      6) see the end of my last answer. The authors acknowledge that there is slightly more bias when ignoring detection errors, but point out there is less variance. This is back in the realm of trade-off there is no absolute best estimator method in this case. But minimizing variance at the cost of an increase in bias (i.e. ignoring detection probabilities), especially if it reduces RMSE (which I am thinking I recall they do) is absolutely a statistically reasonable thing to do on its own merits, even leaving aside the sampling/field costs.

      • Re 2, 3 and 5. Sorry, you’re absolutely right. Apologies to Welsh et al. That’s what happens when you try to do things in a rush too late in the day. The variation should be the same regardless of the size of the effect. My bad.

        Re 6: The tradeoff isn’t only in terms of bias and precision, there’s also the subtle difference in terms of what you’re actually making inference about. With the combined data and simple logistic regression inferences are on where the species has been found, which is a combination of occupancy and detection, while with occupancy modelling we’re attempting to seperate out the biological from the sampling processes to make better inferences about was the species is (or was). Whether you can live with that tradeoff is going to be a personal choice in some regards, but if someone is professing that they really want to understand where the species is, then I’d argue that using simple logistic regression isn’t going to get you there unless detection is very high, and unaffected by any of the same factors that might affect occupancy. Whether you need to have greater sample sizes to get an acceptable level of precision in estimates using occupancy models, then so be it. I make no apologies for that, it’s a consequences of trying to be more rigorous in what you’re doing. There’s no such thing as a free lunch.

        And this is where the importance of study design comes in. It’s tough when you’re dealing with data that has already been collected (as in Welsh et al), but before people even step into the field, they need to be very clear on what it is they are a trying to achieve and why. They should also have a fair idea of how they are going to analyse their data, and realistic expectations about the level of precision they are going to be able to achieve. If the precision is going to be unacceptable given their budget, they need to find a bigger budget, consider alternative methods of analysis or ask a different question. By ignoring detection, I’d argue you’re in the camp of asking a different question.

  18. One of Darryl’s points is that the “closure assumption” is not unique to occupancy modeling and really it applies to any ecological field study where some system state (e.g,. species presence, abundance) is being quantified at multiple sites that are sampled within some time frame. You absolutely have to assume that the state is static across that time frame if you want to compare the variation in states with variation in site attributes (e.g., habitat), unless you explicitly incorporate time as a covariate (which requires additional assumptions regarding additive or interacting effects, and reduces power). Is there something I’m missing?

    Brian, you talked about site fidelity and low mortality during the breeding season, which supports the assumption that your state will remain static for some period of time. If your sampling spanned part of the spring migration and part of the breeding season, it would be inappropriate to compare sites that were sampled early with those that were sampled late in the context of habitat attributes – the early sites did not yet contain the migrants. So the early sampled sites do not contain information regarding the species response to habitat attributes because the species was not yet present to “respond”.

    Like Darryl said, the idea of closure is typically an implicit assumption that can dictate the study design and depends on the inferences the study intends to make. If you relax this assumption it changes your inference (e.g., probability of presence vs. use).

    FWIW, I read the paper and found it to be quite tedious, especially with the errors regarding the figure captions. Those figures could be presented in a much more coherent manner with some modification to their design. Regardless, this certainly isn’t the first paper to examine sensitivity analyses regarding estimation of occupancy models. It is definitely the first I’ve read that violently fluctuates between objectives for over 20 pages! I suppose that is why sensitivity analyses can be so difficult to present. Either way, their main analysis examines 55 sites sampled 2 times each – this is a pretty poor effort and not as “common” a situation as they claim. Also, sparse detection data are known to be a problem – the information in a data matrix with only a few non-zero observations is very low. You cannot expect a statistical model to extract much of anything useful from it, regardless of whether you incorporate detection. If your detection probability is really low you need to improve upon your sampling scheme, rare species or not.

    Brian, I apologize in advance for what I’m about to say regarding Bayesian statistics. 🙂 But… I’ve personally found that sampling the joint probabilities using MCMC withing a Bayesian framework (e.g., BUGS) can help with the boundary issues from which maximum likelihood suffers in logistic regression and occupancy modeling under certain scenarios. This will be obvious to many people because the priors act as a constraint (e.g., no beta estimates >20 for standardized coefficients). But at least the resulting estimates are reasonable and the posterior distributions will help indicate what might be going on (for instance, a bimodal distribution).

    • Hi Dan

      I don’t think I really have anything new to say about closure. Readers that are interested can ready the discussion to date on it.

      I agree the Welsh paper is a hard read. Whether it is well-written or not is not really material to the quality of its science. I do know that it is one of those papers where the review process twisted the paper from how the authors wanted to present. For me the main results center on the simulations (with non-sparse results presented) where we know the answers. And these results to me are striking and highly relevant to the debate of how mandatory it is to incorporate detection probability. Also not sure what “pretty poor effort” means. In my experience 2 visits is very typical (although I agree inadequate to give a great estimate of occupancy).

      I’m not going to rise to the debate on Bayesian in this particular post 😉

  19. If each visit consisted of some intensive sampling, or even replicated sampling (multiple counts spaced apart in time), then I would not necessarily consider 2 visits at 55 sites to be a poor effort. But anyone that has conducted a point count for birds understands how potentially useless a single 5 or 10min count can be, depending on the species targeted and other conditions.

    This is why I agree with you that the backlash you have experienced regarding BBS data is unwarranted. The count for a species on a given BBS route comes from 50(!) stops, which is an impressive effort, never mind the temporal and spatial extents. Yes there are various detectability issues that come into play, but with that much replication for each sample (i.e., route), many of those issues should wash out for the most part. Even if they do not wash out, your inferences are simply conditional on the assumption that there are no systematic effects that could alter the conclusions. That assumption is easier to make with strong study designs.

  20. Thanks Brian for pointing out to the Welsh et al paper and for your post, which has obviously prompted lots of interest. To start with, I should say that I am of the school of thought that it is very important to address the detection issue, by explicitly modeling the detection process or, at least, by very carefully considering/assessing whether the results obtained disregarding it are indeed still meaningful for our objectives.

    I’d like to share some ideas that came to my mind when I read your post. Some have been directly or indirectly touched by others, as I noticed while going through the comments, but let me try to further contribute to the discussion by highlighting some of the points I think are particularly relevant:

    A1) The utility of occupancy as a state variable and the benefit of modeling the detection process are two different issues: a) There is no question that occupancy should only be used as a state variable as long as it is useful to answer the questions relevant to the study; b) Modelling the detection process is also important in the study of other state variables such as abundance (eg ‘distance sampling’), demographic parameters (eg ‘mark-recapture’), etc…

    A2) Repeat visits are not the only way to go: it is not essential to carry out separate repeat visits to model the detection process when modelling species occupancy (e.g. McKenzie et al 2006 point out that simultaneous independent observers can be used; Guillera-Arroita et al 2011 in JABES model detections along a one visit survey route as a 1-dim point process; Lele et al 2012, which has already been cited above, explain how under certain circumstances detectability can be modeled from single detection/non-detection survey visits). Of course, there can be benefits of carrying out repeat visits separate in time, e.g. to mitigate temporal heterogeneity in detection or to model habitat use rather than an occupancy “snapshot” when the species can randomly move in or out of the site. My point is that, once again, it is important not to mix concepts: accounting for detectability does not necessarily equate to repeat visits, different sampling protocols can potentially be used, and some will be more or less appropriate for each particular scenario.

    A3) Optimum survey effort allocation: As Drew and Darryl explained there is a trade-off in terms of how to split survey effort into number of sampling sites and survey effort per site (via multiple visits, a longer single visit…). Increasing survey effort per site at the expense of decreasing number of survey sites can be beneficial, rather than a waste. Hence the importance of a good survey design. Drew and Darryl cited some good papers that address this trade-off. We also dealt with this design issue in Guillera-Arroita et al (2010, MEE), and Guillera-Arroita & Lahoz-Monfort (2012, MEE). Bottom line is that, if there is a problem of imperfect detection, accounting for it, rather than coming at a cost can lead to more meaningful results for the same total survey effort!

    I have also read the Welsh paper, although not in as much detail as I’d like (but I’ll definitely do!). A few preliminary observations I made are:

    B1) My first impression is that they are somehow falling on the same type of mistake they complain about. It seems to me that they are giving a quite biased message against the use of occupancy models that account for detectability, rather than clearly identifying when they are or are not useful. Worryingly, I find the abstract to be too alarmist, including statements that are subjective (e.g. “making accurate inference is difficult”), equally applicable to many other models (e.g. the model can lead to boundary estimates when data is scarce), or in fact quite misleading (e.g. “the estimating equations often have multiple solutions”, see comment B3 below). This is dangerous, as the abstract is the only message many people might read. The final message (“ignoring non-detection can actually be better than trying to adjust for it”), might hold for some specific scenarios, but it is not as general as the abstract conveys.

    B2) I found surprising that they “find surprising” that the issues related to the performance of the occupancy model under small sample size, including the issue of boundary estimates, “are not mentioned… in the methodological literature, where we would expect … (them)… to be discussed”. Without claiming that there is no scope for further assessing these or other related issues, it is important to note that there has already been thought on this (as P.Solymos pointed out above, with some useful citations). More examples include MacKenzie et al (2002) who note they obtained some estimates psi=1 in their simulations when p was small, and Guillera-Arroita et al (2010, MEE) who explore in detail the performance of the constant single-season occupancy model under small sample size and derive the exact conditions that lead to boundary estimates psi=1 (eq3 in the paper).

    B3) In particular, I have gone through the theoretical results in page 6 of Welsh et al. and I find that, as presented, their argumentation is not particularly strong. That section reads as if boundary estimates (and multiple “solutions”) are a common problem in these models. In fact, as written, this section might lead to believe that psi=1 solutions are always a problem when maximizing the likelihood. However, the fact that psi_i=1 solves eq 4 does not imply that there is a maximum of the function for psi=1 (not even a local one). First of all we need to find a point that is at the same time a solution for eq 5, and even then, we need to assess that it is in fact a maximum, and not a minimum or a saddle point. For instance, for simplicity, let’s take the case without covariates. It is true that there is a solution to the system of equations (eq4&5) for psi=1 and p=dT/(nK) (where dT is the total number of detections, n is the number of sites, and K the number of replicate visits). However, in most cases, this boundary point is not a maximum but a “saddle point” in the function, with a value of the likelihood function far from the maximum… putting it in simple words, there won’t be any problem locating the MLE in these cases! The point psi=1 p=dT/(nK) is only a maximum of the function (in fact the MLE) when the condition identified in Guillera-Arroita et al. (2010) holds. This arises sometimes in situations where the sample size is small and the underlying occupancy/detection probabilities are low, not always as the text might imply.

    In the general case with covariates we can expect the same kind of behavior. Given enough data the MLE will be obvious to locate, with the likelihood becoming flatter and more prone to boundary estimates as sample size decreases. Of course, the more parameters (i.e. covariates) the more data are needed to obtain precise estimates. Regarding the statements on “multiple solutions”, the MLE is consistent and has a unique solution (only when psi is a boundary estimate, the betas can be “confounded”, but then either we have too little data or psi is close to 1 so covariates are not relevant). This type of issues is, by no means exclusive to this technique. Mark-recapture can also produce boundary estimates… shall we claim that this technique is also problematic in general? Or is it rather that we should not expect much from our data when the sample size is small?

    I will try to find time to read the rest of the paper in detail soon. I don’t doubt there can be interesting results in it however, as I said, so far I have found the general tone to be too alarmist.


    PS hmm, apologies for a couple of self-citations, but I thought some might find them useful …and for the long post!

    • Thank you for stopping by.

      Re your comments:
      a1) I’m not the one who has confused these. Reviewers of both the Lindemyer et al paper and some of my papers are the confused ones. When a reviewer reads a paper about abundances and says “you should use occupancy instead of abundances so that you can model detection” they are confused. And they say this – I don’t know how often, but not infrequently.
      A2) To me there is not a lot of difference between n-observers visiting once or one observer visiting n-times – it is the same number of person hours of work, which is the limiting factor. The point is ALL methods dealing with detection require extra, usually a lot extra work. Distance based methods seem the most efficient. This, I think is the point I most frustrated with the experts of detection probability over (my A1 and B are frustations with non-experts who end up as reviewers) – dealing with detection comes at a very substantial labor cost and there is just no way around that. This means the argument has to be on the grounds of is the benefit worth the cost.

      B) Again, you have to read Welch et al in the context of what reviewers are saying (i.e “you have to model detection probabilities”). Its not really relevant if somebody has seen and acknowledged challenges in using detection probabilities before if reviewers are saying what the’re saying. Although it is a long and careful paper with analytical, simulation and empirical tests of multiple points, the main point is really important to this debate. Namely that the error in the variable of interest – the slopes of occupancy vs covariate variables – is as large (and almost as biased) when you model detection probabilities as when you just ignore them. Variance and bias (mean) are the only two things statisticians use to evaluate an estimator. The detection model is basically no better and no worse an estimator than ignoring detection. This completely undermines the argument that you *have* to model detection probabilities when trying to estimate these slopes. QED. That is all I (or I think Welsh) is trying to say.

      • Hi Brian,
        I’m going to pull you up on you bias comment above. There’s no evidence of a bias in the effect size when accounting for detection in Welsh et al (figs 4 & 5) except when they included an abundance effect on detection (fig 6), but there is when you ignore detection and use simple logistic regression (if you want to interpret the results in terms of occupancy that is, fig 8). Welsh et al demonstrated that the precision of the estimated covariate effect was less when accounting for detection than not. So on 1 hand you have a less precise, unbiased result (unless detection a function of ‘abundance’; though note while they’ve conceptualised it as abundance, it’s realy a non-linear relationship with years since planting with random noise), and on the other a more precise biased result.

  21. Apologies if this is a bit off topic… I have been involved in a couple of studies recently using multi-species occupancy models to estimate species richness.
    From what I understand, occupancy and detection probability should be independent. So if one was to plot detection vs occupancy for a range of species, there would be no relationship. However what I keep finding is that the estimates of occupancy are positively correlated to detection. And often quite strongly. This to me suggests detection is related to abudance (as pointed out in the blog post).
    In my understanding, one use of occupancy models is to use occupancy as a proxy for abundance when abundance cannot be measured, whilst explicitly accounting for imperfect detection. However perhaps estimates of detection are really in some way a better proxy for abundance? It is all a bit tangled..
    Or is it only really an issue when local abundance is very low. i.e. it is probably much easier to detect at least one individual from a local abundance of 10 than a pop of 1, but the difference in detectability of at least one when abundance is 50 is probably similar to if it was 100…

    • Hi AndrewG – good questions. I don’t know all the answers so would welcome other comments.

      But as you say occupancy and abundancy are well know to be very strongly correlated. And it is very rare to have enough data to truly separate occupancy and abundance. With only one visit you have a perfect identifiability issue – the model is completely unable to tell whether to have an occupancy of 0.6 and detectability of 1.0 or an occupancy of 1.0 and a detectability of 0.6 or anywhere in between. This identifiability issue is one of the things going on in the Welsh et al paper. With two visits (the most common scenario in my experience) the identifiability improves only slightly. This identifibability issue would I think lead to a strong positive correlation across, say many species, but it could be spurious (i.e. statistical artefact). Only with a devoted effort to detectability – e.g. six repeat visits – are you going to start to get good separation of the two. And I am not sure of what the studies show in this case. But again as you note, the studies clearly show detectability and abundance to be positively correlated (pretty strongly – and as discussed not a surprise) and abundance and occupancy are pretty strongly positively correlated. So just by this argument detectability and occupancy must have a fairly decent positive correlation.

      Not sure where that leads or what the implication is, but I thank you for pointing out the correlation. I’m going to have to think on this more.

      • Sorry – I should’ve said, for our studies we had 5 replicate samples for each site, and were recording presence of any bird species. So the idea was indeed to estimate occupancy for each species whilst explicitly accounting for detection probability of each species. One thought I have had is whether the replicates were truly independent. So instead of our intended 5 replicates, if they are not independent, then the effective number of replicates might be closer to 1 or 2 reps. – which may cause that pattern of postive correlation. Although the species with the highest estimates of occupancy and detection prob are also the ones with the greatest abundance (as determined from an independent set of data that was using distance sampling to estimate density…).

      • Well that is pretty strong evidence for a real (biological) correlation between detection and occupancy as well as a statistical artifact. This could still be indirect abundance driving both occupancy and detection. But to me this just kind of confirms the whole point of the post – detection probabilities, at least as practiced today, are not some magic method that makes everything come out perfectly but indeed, like any other statistical technique, are rather mired and messy dealing with collinear data.

        Not sure if I’ve answered your question though …?

  22. Hi Darryl
    Thank you for reading and addressing the Welch et al paper! (in your Feb 11 comment). Since we had reached the limit of 3 levels deep, and this is a summative comment, I am putting it down here.

    There is a lot going on. For each model there are estimates of occupancy, detection, and slopes vs year. And there are 4 models (ideal, sparse, abundance correlated – and then the no detection/occupancy only model). So right there we have 4×3=12 things (actually 11 because there’s no detection probabilities for the last model) that can be biased or unbiased. A little bit of everything happens somewhere out of 12 cases.

    What I said in the comment was about the slopes only. You say “there is no evidence of bias” in Figures 4 & 5 (ideal & sparse simulations). I’m not sure I see it that way (the dashed line is not in the center of the histogram in the bottom left of figure 4 or figure 5). For figure 4, the authors say it is “symmetric” but never say it is unbiased, but they do say almost 53% of the slopes are on one side of the true value. With 5000 simulations and a distribution that looks very close to normal and in any case is asserted to be symmetric, that is almost certainly biased. For the sparse model it is much more clearly biased (bottom left figure of Figure 5). And the abundance correlated case is very clearly biased (Figure 6 – looks like 70-80% is on one side of the line). Now I am arguing from the median because that is what I can see in the figure. Bias is admittedly defined by the mean, and the tails could change to cause the mean to tell a different story than the median, but as best I could tell the authors don’t actually report the means. In short, by my read, the slopes are biased for all 3 models with detection probabilities. If you think the authors say the slopes for any of the 3 models w/ detection are unbiased or report mean estimates, please do correct me and let me know where. But that still wouldn’t undermine my next two points (and my claim you are responding to was really only about the 3rd point). And conversely I never made a big deal about there being bias in the first two models (ideal, sparse=Fig 4 & 5), because I agree they are not huge if they exist, and I don’t think they’re the most relevant scenario.

    All this does not even start to address the other estimates (of detection and occupancy). Notably – even in the “ideal” case (no sparsity, no correlation with abundance), the estimates of detection probabilities themselves are biased! And both occupancy and detection are biased in some other cases with detection probabilities (e.g. middle row Figure 5).

    And finally – and the point I was making before that you responded to – for me the money case is when detection is correlated with abundance, which is known to vary strongly between species and sites and time. This is so obviously the most likely scenario. And that is the scenario I referred to in my comments. BOTH models are biased on slopes (albeit detection is less biased) but the variance is bigger when detection probabilities are INCLUDED. The authors never report RMSE, which is what I know as the best way to put bias and variance on one scale, but from what I can eyeball, the detection model would have a worse RMSE at a guess. Certainly not a better one, and given the labor costs of detection probability estimation, it really ought to have a much better RMSE to justify it in the most real-world-like scenario.

    And that is my bottom line:
    1) in the real world (i.e. abundance correlated)
    2) when looking at overall estimation quality (i.e. bias & variance combined, such as in RMSE)
    3) the model including detection probabilities does no better and possibly worse than the model that ignores them
    4) and this equal or worse fit is obtained at great cost of labor
    In this context, there is no way one can say that one MUST do detection probabilities

    • Hi Brian,
      Don’t have time to make point-by-point responses at this stage, but my bottom line(s) is this.
      1. If you ignore detection, your final inferences are about a convoluted mix of biological and sampling processes.
      2. If you want to state those results are relevent to the real biology, you have to make some assumptions that the sampling process isn’t screwing things up for you. Often those assumptions won’t be explicitly stated (or even realised).
      3. If you want to be more rigourous and attempt to separate out the sampling from the biological processes, you have to do more which may mean greater costs (but not necessarily as Guru pointed out earlier). No apologies or excuses for that; there’s no such thing as a free lunch.
      4. Instead of rushing out into the field to start data collection, you need to spend the time carefully designing your study and getting a realistic expectation of what can be achieved given your resources. In my experience, this is often poorly done.
      5. Rather than relying on models to fix things for you, try to get better quality data. If you think there’s going to be variation in abundance between sites, try to come up with some ways of mitigating that effect while collecting the data, and have a clear idea of how you’re going to analyse the data.
      6. Know what analysis you’re going to use to answer your questions of interest. Be aware of the required assumptions and design the study to meet those assumptions as closely as possible. ALL methods of analysis (including those that don’t account for detection) involve assumptions, and not only statistical assumptions (e.g., independence) but also assumptions of interpretation. How does the manner in which the data was collected and the analysis used mean I should interpret the results of that analysis?
      7. Occupancy models are statistical, not magical.
      8. “An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question.” – John Tukey
      9. MUST you always account for detection 100% of the time? No. But given it’s well known that it can potentially provide misleading conclusions about the real biology if you don’t account for the sampling process then you certainly need to be very aware of the detection issues while sampling and be able to provide some very strong justification for why it can’t be leading you astray in your final inferences about the species. Or, be honest and admit that the results of the logistic regression are potentially contaminated by the sampling process so should be interpreted on where you can FIND the species, not necessarily where the species IS.
      10 (and finally). Occupancy models are not infalliable and can also give misleading results. They are however an attempt to account for the sampling process and get us one-step closer to what we’re really interested in. They do, however, have to be applied correctly and have certain requirements and limitations, just like any form of analysis. They’re not always going to be the perfect tool from the toolbox of methods that could be used, but ignoring detection when it’s known to be an issue isn’t a useful solution either.


      • Hi Darryl – thanks.

        I’m kind of disappointed to see you blow by the issues about the Welsh paper that we were discussing. To me this is data, everything else is opinion. I had been appreciating your being willing to get into that paper with me.

        Most of your points are general points about statistics. More data is good. Good data lets you extract more information. All models are flawed. Think about your design before rushing out to collect it (Amen to that one!), etc. Nothing special arguing for detection probability.

        To me though your #8 – the Tukey quote is exactly an argument for why one might not use detection models. If my question has nothing to do with detection, why is it so outrageously beyond the pale of acceptable behavior to let detection be one of the known sources of error I will ignore the better to answer my question (we ignore many sources of error and violations of assumptions every time we do statistics of any kind – which was Tukey’s point).

        All of statistics exists in a world of trade-offs and that includes detection probabilities. Sometimes its a good trade-off. Sometimes its bad.

  23. Pingback: No Statistical Panacea, Hierarchical or Otherwise | Daniel Hocking, Ecologist

  24. Hi Brian,

    thanks for this interesting post (and the original post on statistical machismo in general). I strongly agree with you that to request a particular statistical method for all cases can do harm, and that nothing can replace careful thinking about the data, the questions to be answered, and potential problems linking the two.

    I just wanted to point out that in your post and the following discussion, two things do not seem to be sufficiently differentiated: the problem of undetected species and the estimation of detection probability in occupancy models as a solution. I think that the problem is an important one that all ecologists analyzing field data should acknowledge and think about. I have the feeling that some of the disagreement is actually about confusion between the problem and the solution.

    Incomplete observations are pervasive in many fields of community ecology, and I feel that this is still not adequately considered by many ecologists. In my personal area of experience, I think it may often be the most important source of potential bias, more important than e. g. uncertainty in estimating degrees of freedom that stasticians are so concerned about. You were asking about which major ecological patterns have been shown to be wrong by acknowledging detection probabilities. In my view, many of the things we thought we knew ecologically may be true, but we won’t know before researchers do a better job at understanding the effect of incomplete observations (sampling biases) on recurrent patterns. I don’t think this will be achieved by occupancy models alone, but looking forward to what they can contribute.

    An example are food webs and other interaction networks. Occupancy models haven’t arrived in this field yet, but incomplete observations can be a major problem (Bluethgen et al. 2008 Ecology, Chacoff et al. 2012 J Animal Ecology). For example, classic discussions about the relationship between connectance and food web size haven’t been resolved so far (but the patterns still get cited). Many studies describe patterns in the distribution of links etc., but in large networks and with typical sampling methods, the probability to detect a link between two rare generalist species is extremely low. This is similar for many community datasets and descriptors such as dissimilarity and species richness (I know this is not a new insight, but still often ignored).

    Some of your comments seem to suggest that detection probabilities are mostly important if you are interested in them, and less so if you are interested in broader-scale patterns. In contrast, I think that missing observations may become even more of a problem when you include many species and look at the big picture. In the worst cases (e.g. many singleton species), it might not even be possible to estimate detection probabilities with occupancy models. Your suggestion (in the original post) to trust very low p-values is also not useful in this case, because pervasive biases can produce highly significant patterns (undetected species often don’t just add noise). There is no alternative to thinking critically about the own results and be cautious with interpretation; there is no statistical method that is always correct (although that would make life easier), and you’re right that is sometimes forgotten by reviewers. Sometimes I find null model simulations helpful to facilitate this critical thinking ,and I agree that it is often a bad idea in such sparse data cases to discard the additional bit of information that you have in abundance data.

    I guess you agree with most of what I said, but I felt it important enough to warn against using this blog as an excuse for not considering sampling biases in community data.

    • Thanks for your comments Jochen. They’re well-said. But like Brian, I have to say I find almost all of the pushback Brian has received, from your gentle pushback to the firmer pushback from others, to be disappointingly stuck at the level of generalities with which no one could disagree. Brian’s post isn’t just about generalities–he cited and described in detail results of a specific paper showing that, in a specific set of circumstances, trying to estimate detection probabilities often is *worse* than not trying to do so. My disappointment here isn’t directed at you specifically, indeed you say you mostly agree with Brian. But speaking as a curious outsider (and as someone who disagrees with Brian strongly on various things and who has no particular inclination to defend him; he can take care of himself), my overall impression is that many commenters have basically ended up playing lip service to Brian’s broad claims while declining to really engage on specifics.

      So, since many folks on this thread seem to be unwilling to engage on specifics, let me instead push back against detection probabilities in terms of broad brush generalities, by highlighting a post Dan Hocking made recently over on his blog (link in one of the comments above). Dan asks a good question: if you think detection probabilities should always, or usually, be estimated, or that estimating them should be a “default” approach to which there should be only rare (and strongly argued-for) exceptions, then presumably you also think ecologists should redo pretty much every empirical study of abundance or occupancy conducted before about 2001 or 2002. Right? That’s the implication of your stance, right? Because that’s about as far back as the statistical methods used to estimate detection probabilities go. So, unfortunately, if we want to be able to draw reliable empirical conclusions about anything having to do with abundance, we have to start from scratch? I’m not asking this as a rhetorical question, I’m completely serious. For instance, food web ecologists collectively decided that the food web data compiled from diagrams in the literature by Joel Cohen in the late 1970s was so unreliable as to be literally worthless, and so went out and started from scratch by painstakingly collecting massive amounts of gut content data and other hard-to-get data. Are you arguing for the same thing? If so, I think it would be very helpful for you to come out and say so explicitly. If not, then I’m afraid what you’re admitting, implicitly, is that abundance and occupancy data collected without estimating detection probabilities is not without value and can be used to draw reliable ecological conclusions in a non-trivial fraction of cases. And you’re further admitting, implicitly, that whether or not detection probabilities need to be estimated must be decided on a case by case basis.

      I really don’t see a principled middle ground here. Either you think that it’s very unusual, perhaps even impossible, to be able to draw reliable conclusions without estimating detection probabilities, in which case you think pretty every study conducted before about 2001 is worthless and needs redoing. Or you think that it’s not so unusual, in which case you’re admitting that it’s a case-by-case judgment call on which one should not have a “default” stance one way or the other. Since “default” stances are surely only appropriate in cases where there are at most only very rare, unusual exceptions to the “default”.

      Or, you can admit that, in arguing for the very broad importance of estimating detection probabilities, you’re overgeneralizing from the specific cases in which estimating detection probabilities is well-justified for case-specific reasons.

      • I wouldn’t include estimating abundance in here. Mark-recapture methods have been commonplace for many years. I guess distance sampling in cases where individuals cannot be identified are more recent, but still as far back as the early 1990s.

      • I fully agree to case-by-case decisions. A short mention in the methods why detection probabilities were not considered won’t hurt, right? I think that editors in particular could play a role here, by encouraging authors to explain their approach instead of incorporating every re-analysis suggested by reviewers.

        Regarding the old studies: maybe reconsider the more important ones. That means, (1) check whether there is doubt about patterns being potentially caused by sampling biases, (2) try to find other data available about them (e.g. abundance data or subsamples), do some simulations or null models, and (3) then select a few to be repeated with new data collection using modern methods. A little bit of replication of “already known results” would suit ecology very well anyways. I’d be excited about the outcoumes.

    • HI Jochen

      So I agree that the issue of detection probabilities to look at occupancy or abundance within a species is different than things assessing number of species across species. Not sure where I ever said otherwise. The Buckland et al chapter I cited in my original post looks at the latter, but almost all of the post and discussion has been on the former. There are certainly issues of non-detectability when, for example, studying species richness. But people are not at all unaware of this and use methods like rarefaction or Chao estimators (one of the top 10 cited papers in ecology is Gotelli & Colwell’s paper on these methods in Ecology Letters). And as Jeremy has already pointed out, food web and network people have looked seriously at the consequences of this effect. Not at all sure its fair to say its a forgotten and ignored topic. But the solutions often look quite different than the direct modelling of detection that is the subject of this post (but do often look like the null modelling you mention).

      • Hi Brian,

        sorry for writing slightly off-topic, but I thought this isn’t just about the details of specific models. I had the feeling that some of the disagreements in this posts seemed to be actually about whether to acknowledge incomplete sampling. As far as I understood, people argue for detection probabilities mostly due to the long-known biases of undetected species, and the paper you started with seems to suggest that occupancy models may not be the universal weapon against it that some of the protagonists claim. I am not sure whether I see a principle difference of detection issues between within-species and across-species studies; and multi-species occupancy models seem to address the latter.

        Of course, correction methods for species richness are widely used. I am still surprised how often they are not considered. And when it gets away from species richness (e.g. dissimilarity), it seems to be less common to consider these effects. It’s also true that sampling issues have been discussed in food web studies, but it seems that many studies using the term ‘network’ still assume the data are complete. I think that every study that uses incomplete (field) sampling of frequency or presence/absence should somehow consider (at least verbally) whether this could cause biases throughout their results.

        Back to the issue of detection probability estimation in occupancy models (about which I know much less than most contributing to this discussion): has anyone done kind-of a meta-analysis of a bunch of studies that use occupancy models and compared the results to re-analyses of the same data with simpler methods? That would seem helpful for resolving the issue here.

      • All scientific data collected has multiple problems, assumptions, and known sources of error. The question is what to do about it. At one end are those would say do nothing until we have perfect data or perfect methods for analysis. At the other end are those who ignore every issue. At the first end lies the death of science. At the other end lies wrong science. Everybody is somewhere in the middle, but has to think about and then justify to their peers where a particular piece of work falls on this spectrum. That’s just science. And it works pretty well, especially the part about justify to their peers, at least when their peers are open minded and thoughtful.

        What drives me crazy, what this post about, and what breaks science, is when the peer reviewers pick out only one particular error (often because they developed a method to deal with it or because such method is trendy) and then insists everybody has to be just as concerned about this – and only this – one particular error as they are and then said reviewer is willing to stop all science of anybody who is not as concerned about the one source as they are.

        I’m rather more sympathetic with the just plow-ahead end. At least they’re trying to move science forward and putting something out there for others to criticize (and in some cases show why their work is fatally flawed but that is a victory for the scientific discourse). At the other end lies data and manuscripts sitting in drawers (hard drives?) and never entering the scientific discourse. The latter is far more dangerous to the progress of science.

        To your comments, if all you’re saying is a good paper should mention and discuss the limitations and their possible impact, then yes. That is good scientific writing and I tell my graduate students better to be ahead of the reviewers than behind. But so often this slips into “you cannot publish your paper until you do X” which is really dangerous. Sentences that say that “every paper that has flaw X must do Y” are wrongheaded (unless Y is just “discuss the limitations and your thinking on it” which is what you said, so I guess I agree, but I still hate the word “every” in this context).

        I would LOVE to see a meta-analysis like you describe. The closet thing I’ve seen is the Welsh et al paper, but it is only one dataset, one application of detection probabilities etc. I expect a couple of years from now somebody will do one and some sanity will return.

        Thanks for some thoughtful comments.

  25. Sorry to disappoint you earlier Brian, but I’m too busy at present to really get into details of the Welsh et al paper with you; the curse of not having a salaried position is not always having the ability to follow up on interesting things.

    You’ve clearly missed the intent of me throwing in the Tukey quote. By ignoring detection the ‘question’ you’re trying to answer is a confounded combination of biological and sampling processes, which is the wrong question if you’re proclaiming to be interested in just the biological processes. Occupancy models are one way of trying to separate those two to try and make better inferences about the underlying biology. No claims that they’re perfect and without they’re own issues, but they are a step in the right direction. I agree that you always have to make some tradeoff in terms of assumptions, but assuming that detection issues are going to come out in the wash is a pretty big one to make, particularly when detection will likely often covary with the same type of variables that are of interest for species occurrence or distribution.

    I’m not saying that I think reviewers should always be insisting these types of methods are used as sometimes people may not have the appropriate data to even do that. I’ve also reviewed papers where people have used these methods, but have done so inappropriately such that they may have well just used logistic regression in the first place. What I am saying is that if people really want to understand biological systems, they need to be very mindful of how the act of sampling could lead to misleading inferences if not suitably addressed during the collection and/or analysis of the data. Science may still progress by ignoring such fundamental issues, but it is likely to progress much faster by not doing so. If that requires more resources to do so, then so be it, that’s the cost of setting the bar higher.

    In terms of comparisons of occupancy vs logistic regression, check out the pronghorn analysis in our book or MacKenzie (2006) Modeling the probability of use: the effect of, and dealing with, detecting a species imperfectly. JWM 70: 367-374


    • Thanks Darryl

      Not sure the line between biological and sampling processes is as black-and-white as it is made out to be especially. When detection is strongly tied to abundance – is it not then effectively measuring an important biological variable – just not occupancy?

      Thanks for the reference MacKenzie 2006. Its a slightly different context but certainly an important one for wildlife management. You show that estimates are biased without modelling detection probability (although to my read only badly biased -e.g. changes rank of habitats – in the probably least likely scenario of detection probability negatively correlated with habitat usage). The Welsh et al paper also showed this. Unless I missed it though, you don’t address the issues of sparsity, detection correlated with abundance nor most importantly the variance in the estimates (aka the efficiency) that were the key points of the Welsh et al paper. Did I miss that? Do you know a paper besides Welsh et al that addresses these issues and shows how detection probabilities make the answers much better?

      I’ve appreciated the time you’ve taken to comment here!

      • Brian,
        While abundance might be an interesting biological variable, if you’re considering methods such as occupancy models (or logistic regression) then you’re clearly focusing on presence/absence of species. Therefore, abundance is more of a sampling issue; how many are at a place will have some effect on detection, but so will other factors aside from abundance. If you want to say interesting things about abundance, then really you should be using different methods. This is where it’s really important that people are clear about exactly what they do want.

        You seem to have missed the point in that paper where your final inferences on what factors appear to be important for occupancy are quite different depending on whether you account for detection or not. No, the other issues you raise weren’t covered in that paper.

        You also seem to be missing the point that the quantity you’re making inferences about is different depending on whether you account for detection or not. Again, if you ignore detection inferences are about a confounded combination of sampling and biological processes. When you account for detection you are trying to separate those two. In comparing methods, it’s not only about bias and precision, but also interpretation. If the interpretation is different (as here) it’s an apples and oranges comparison so the other issues are potentially a moot point. If the interpretation is the same (as in lots of other situations when you’re comparing estimators), then bias and precision considerations are much more relevant. So to me it’s a first principles argument, and the key in whether you should account for detection or not (both in the field and the office) is all about how you want to interpret your results. IF you want to separate out those 2 processes then you probably need to more (or at least do things differently) that if you don’t.


      • Thanks Darryl,

        If you want to say interesting things about abundance, then really you should be using different methods.

        Agreed! I just wish people would stop telling me I should use occupancy models when I’m interested in abundance.

        As for the rest, I am not missing the points. I’m just not a believer that there is a perfect statistical test or experiment. Good science involves living in a messy world. All approaches involve compromises and trade-offs, and I’m making different compromises than you are (for some cases – there absolutely are cases where I would say analyzing detection is required as I’ve said from the beginning). I personally think such diversity of approaches is a good thing for science.

  26. Pingback: Ecologists need to do a better job of prediction – Part IV – quantifying prediction quality | Dynamic Ecology

  27. Pingback: We need more “short selling” of scientific ideas | Dynamic Ecology

  28. Pingback: Friday links: Science Cafe at the ESA meeting, Peter Medawar > EO Wilson as a source of advice, and more | Dynamic Ecology

  29. Pingback: Happy Birthday to us! | Dynamic Ecology

  30. Pingback: Why advanced machine learning methods badly overfit niche models – is this statistical machismo? | Dynamic Ecology

  31. I hate analytical trends becoming a requirement for publications as much as the next person, so I was really looking forward to liking you article about the trendiness of this approach and the need to apply it willy-nilly whether it’s appropriate or not. But your summary of what detection probabilities are and how they work is fundamentally wrong- the intent of the approach, what challenges it solves or minimizes, and its utility has nothing to do with the challenge of censusing mobile organisms or the problem of claiming presence when your not actually censusing (counting every individual). It has nothing to do with censusing, and nothing to do with mobility per se (you can use these tools on plants). And your analysis of when they should be considered useful and when not is way off-base given the kinds of questions many wildlife researchers are trying to answer I have worked with these kids of analysis and have found them to be tremendously powerful tools for tackling some poor understandings of species distributions and habitat associations, some wrong ideas about species patterns of commonness or uncommonness due to limitations in survey effort, and way out of line inferences drawn from rather poorly conceived study designs.

    • Care to provide some citations to back up the claim that Brian doesn’t know what he’s talking about, and that illustrate what you believe to be appropriate uses of these techniques? Because if you read the (admittedly-lengthy) comment thread, you’ll find Brian engaging in great detail with a number of researchers who have developed and used these techniques. You may not agree with Brian’s views on when they are or aren’t appropriate to use, or the trade-offs involved in using them (others in this thread don’t, though of course that doesn’t mean they’re right to disagree). But I don’t think you’ll find that Brian is simply ignorant or confused, as you seem to be suggesting.

  32. Really appreciate the blog post, Brian. I’m currently working on Master’s and am employing detection probabilities. Given my circumstances and research question I feel they’re important for me. Although I, like you, am in the pool of people who agree that they are not always necessary, I do feel they should be an important consideration prior to concluding their necessity or lack thereof.

    I’ve enjoyed this thread, both the people who disagree and those who agree, as I think this type of dialogue is exactly the point of a forum like this, and I think there is validity to a lot of what is said…though it boils down to balancing your data collection, methods, analyses, with the questions you want answered, and that’s a case-specific debate. As pertains to detection probabilities, a recurring concern from you seems to be the astronomical added costs.

    I was looking through Farnsworth et al.’s paper (A REMOVAL MODEL FOR ESTIMATING DETECTION PROBABILITIES FROM POINT-COUNT SURVEYS, 2002. The Auk), and feel that this helps address the cost issues in some capacity. The problem with your argument about re-visiting a site being expensive (true), is that there is no definition of what constitutes a second visit. If three visits can simply be new 5 minute time intervals in a 15 minute survey then surely the costs would not impound so prohibitively that detection probabilities could be rejected forthright. Curious on thoughts, since this is how I will be addressing detection probability through my study (as I need to balance number of sites with time…the never-ending struggle), and would be interested in the known/perceived limitations/concerns with the methodology. It appears to be reasonably well cited, and I feel it’s a reasonable paper, but my experience in this things is at an introductory level

    • Hey Rich, you might be interested in that paper that we have currently published and addresses the accessibility bias using a continuous time removal model, and this is combined with distance sampling:

      Sólymos, P., Matsuoka, S. M., Bayne, E. M., Lele, S. R., Fontaine, P., Cumming, S. G., Stralberg, D., Schmiegelow, F. K. A. & Song, S. J. (2013): Calibrating indices of avian density from non-standardized survey data: making the most of a messy situation. Methods in Ecology and Evolution, 4: 1047-1058.

      Also, as I noted above: repeat visits are not absolutely necessary for correcting for detectability in occupancy and abundance models.

    • Rich,

      The paper below might also be of interest to you. This is the equivalent to the “original” occupancy-detection model, but here for a sampling protocol where detections are collected within a single visit of a given duration (or a transect of a given length); detections at occupied sites are modelled as a Poisson point process.

      Guillera-Arroita G, Morgan BJT, Ridout MS, Linkie M (2011) Species occupancy modeling for detection data collected along a transect. Journal of Agricultural, Biological and Environmental Statistics 16: 301-317.

      See also my earlier comment above. As Peter noted, repeat visits are not necessarily a must.

  33. Pingback: Accounting for detectability is not a waste – a response to Welsh et al | gurutzeta's research

  34. Pingback: Detection probabilities, statistical machismo, and estimator theory | Dynamic Ecology

  35. Pingback: Detection probabilities – back to the big picture … and a poll | Dynamic Ecology

  36. Pingback: Statistical Machismo | brouwern

  37. Pingback: Taking statistical machismo back out of twitter bellicosity | Dynamic Ecology

  38. Pingback: Machismo estatístico (tradução) – Mais Um Blog de Ecologia e Estatística

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.