Detection probabilities – back to the big picture … and a poll

I have now had two posts (both rather heavily read and rather contentiously debated in the comments) on detection probabilities (first post, second post). Whether you have or haven’t read those posts, they were fairly technical (although my goal was to explain technical issues in an accessible way).

Here I want to pull way back up to 10,000 feet and think about the boots on the ground implications. And for a change of pace, I’m not going to argue a viewpoint. I just am going to present a scenario (one I see every semester, one that I know students all over the world face from conversations when I travel) and ask readers via a poll what they would advise this student.

So you are on the committee of a graduate student. This student’s project is to study the species Debatus interminus which may be a candidate for threatened listing (little is really known). The primary goals are: 1) to assess overall occupancy levels of D. interminus and 2) to figure out how occupancy varies with four variables (vegetation height, canopy closure, soil moisture, and presence of its one known predator, Thinking clearus). Obviously these four variables are moderately collinear. Given resources, length of project, accessibility of sites, that the student is the only person able to visit the sites, etc you calculate the student can do exactly 150 visits. Various members of the committee have advised the student that she/he should:

  • Scenario A – identify 150 sites across the landscape and visit each site 1 time, then estimate ψ (occupancy), and do a simple logistic regression to give β, a vector of regression coefficients  for how ψ varies with your four variables across 150 sites.
  • Scenario B – identify 50 sites across the landscape and visit each site 3 times, then develop a simple hierarchical model of detection proabilities so you will estimate ψ (occupancy), p (detection probability), and β, a vector of regression coefficients in a logistic regression for how ψ varies with your four variables at 50 sites.

Would you advise the student to follow scenario A or B? And why? Please take our poll (should take less than 5 minutes). I am really curious what our readership will say (and I care about this poll enough that I’ve taken the time to do it in Google polls so I can cross tab the answers with basic demographics – but don’t worry your anonymity is ensured!)

Depending on level of interest I’ll either post the results in the comments or as a separate post after a few days.

And – as everybody knows – a poll in a blog is not a scientific sample, but it can still be interesting.

17 thoughts on “Detection probabilities – back to the big picture … and a poll

  1. Brian, in your poll, is the species a tree, a frog, a gastropod, or what? Without this basic information, this hardly seems like a scenario faced by “students all over the world.”

    I would say that a more important scenario is one in which a decision needs to be made based on scientific information. For example, a manager wants to spend money on conservation efforts if psi falls below 0.25. If psi=0.5 and p<0.5, the "simple" method you advocate would result in a bad decision, even if the MSE is lower than that of an occupancy model.

    More generally, I think few people will favor a method that is biased and has unreliable confidence intervals over one that does not suffer from these problems. I think you disagree in the case where MSE is lower for the simple method. That is fine. You have your opinion. But do you really need to state your opinion and call people macho-ists at the same time?

    • Species is intentionally left vague. If it makes a difference to your opinion you’ll have an option to specify that in the poll.

      You’re welcome to contribute your opinions. I’m curious to see what the overall opinions are.

  2. Dear Brian,

    I don’t really know anything about detection probabilities (including much of the scientific and applied questions behind the issue) but what strikes me in this discussion is that it seems to me that given the methods you have there is a certain minimum of data you need (number of sites, number of repeated visits if any, etc.) to answer questions satisfactorily. Shouldn’t you base your sampling design on those numbers and not on your resources? So instead of asking “150 site, no repeats” or “50 sites, 3 repeats” wouldn’t it be better to say” That’s what we need!”? Fair enough, resources are always limited but why not accepting the fact that some questions simply cannot be answered reliably without spending more time, more money, more man power etc.? If you don’t do that any answers will be too biased or too uncertain to be helpful. In fact such under-resourced projects may even make things worse because they can give contrary results. I guess this is actually a more general problem in ecology.
    Of course this doesn’t touch the issue of historical, already existing data, but maybe even than, you have to bite the bullet and not use them (what of course depends on your question you want to answer, about which I don’t know much, as I said).

    • Fair points, but I actually am pretty confident you can get at least a decent answer under either scenario A or B. I’m just interested in what people think would give the best answer. And I see very few projects that say tell us how much it will cost to get the right answer – it’s always we managed to find $78,000 and we need an answer (recall that most detection probabilities. Remember most of these projects at least in the US are funded by state or federal government, not NSF grants where you can apply for whatever you think you need (and then get rejected 96% of the time!).

      • Yes, you get what you pay for, this is how it is. It just concerns me when policies and decisions are made based on answers that cannot actually clarify questions at stake. Things can than easily be manipulated. In case of occupancy data, I take your word for it that 150 data points are enough no matter what, but how often is it the case that we know how much we have to invest? And how often do we base our sampling, replication, experiment duration etc. on available resources and not on statistical requirements? These are honest questions, it may be that it is rather the sort of questions we ask (at least in “basic” ecology) that is determined by resources and not the way they are answered.

      • You make good points. I do think in this particular case the sampling size is big enough (and it seems most of the poll responders agree). But if you’re a scientist and you’re asked to give answers that will inform policy but the money is not enough to cover an adequate sample size what do you do? Walking away and doing nothing doesn’t seem right. Doing work that is likely to be overinterpreted by the politicians doesn’t seem right. Of course explaining and asking for more money is right, but 8 or 9 times out of 10 the answer will be no. No easy answers. Might make an interesting post on its own.

  3. Regarding the statement: “I feel like there is a lot of pressure/political correctness from reviewers to use detection probabilities,” I have to say this is true except when you are publishing an animal study, and modeled detectability of anything other than animals. Go figure.

  4. I would happily answer the poll, but what comes to my mind first is…what is the detection probability😉 ? Yes I do get that we don’t know much about this species, but surely we can do a few exploration (with multiple visits) of some sites of confirmed presence (from past works, reports…) to get a (admittedly bad) estimate of the number of visits to achieve high (>0.8) detection probability (at the site level, not at the visit level). Then I would select the number of sites I can survey, ask myself if I do think I have a reasonnably good coverage of the variability of my 4 environmental variables. If the answer is yes I would go for design A. If the answer is no I would increase the number of sites, reduce the number of visits, go for design B.
    Very much empirical, but i think the coverage of the environmental space (better with more sites) is definitively something that must come into the thinking of the design. A random distribution of a few sites (very well surveyed then) might easily end up failing to cover the full variability of the environment (particularly important in the days of climate/habitat changes), and also separate the effects of the environmental variables (I guess that what you wanted us to say, ie. your comment on moderately colinear variables…;-)).

  5. (I may have answered several times but you should be able to tell from my comments)

    This is an interesting question to me working on several different taxa. In the bird world, detectability is huge, and rightly I think*. In the herp world, it’s not at all an issue that I’ve seen and the science really suffers from it**. Now in the freshwater world, detectability isn’t corrected for in exactly the same way but fish biologists are well aware of what happens if you don’t correct for unit effort (how hard it is to catch the fish) – fisheries will collapse. Your catch mass may be the same over time (i.e. your apparent occurrence) but if you’re taking more time to catch the fish or using different gear, you’re not tracking true occurrence.

    *Especially for more realistic questions involving multiple species and/or multiple habitat types and/or multiple observers and/or surveys done over time. Or even in your questions – how is the detectability of the predator? Probably lower than the prey species (given predators use larger habitat patches and are less likely to be in any given area when you survey)
    **Hey, it turns out that herps are really hard to find! Even the common ones. Does only 5 reported Rainbow Snakes over 20 years and 13 states mean they’re endangered or are they just really hard to find? Who knows!

    • Thanks ATM – definitely good to hear real organisms mentioned in this post! I appreciate hearing about the contrasting perspectives you have observed. And I don’t doubt some of my views of detection probability come from the fact that at heart I am tree person.

    • Gah, I meant to make that very point in the poll and forgot: only doing one sample doesn’t mean you can’t have some measure of detection probability, as long as you record your search effort in some quantitative way. We look for frogs under rocks, so we count the rocks. Bird people always time their counts. Fish people use standard trawls. Tree people use per area, although they don’t think about it that way.

  6. I selected “90+% of the time go with design B (50 sites w/ 3 repeats+detection modelling)” because:

    1. Without any idea about how detection probability co-varies with the 4 environmental variables or the general level of detection probability one should expect (you give no such information), inference from design A would be iffy.

    2. Design B would be a lot more informative when it comes to designing new studies about this species.

    3. It would be useful to have some idea about occupancy probability (psi), and not just the slopes with environmental covariates (beta) even though these estimates may be very uncertain (prior knowledge/assumptions may be built into priors). Estimates of beta’s from design A would also be biased estimates of beta’s reflecting psi.

    4. I would prefer to have unprecise estimates about something biologically meaningful (psi and beta’s reflecting psi) than precise estimates of something that depends on your (arbitrary) efforts (psi*p and beta’s reflecting psi*p).

    5. Because results from design A depend on effort, how good the observer is etc. (which is hard to describe and replicate), it would be difficult to use information from this study in other studies (e.g., meta-analyses)

  7. Never worked with detection probabilities (yet; though I’ve would have done lots of it if, graduating one year earlier, I had gone to study a small patagonian marsupial).
    Anyway, in genetic sequencing there is a similar trade-off between getting more biologically different samples, or replicating to get “depth” (and, among other things, detect errors in the sequencing process).
    In both cases, it seems that a good compromise can be made but doing an unbalanced design, where some sites are replicated and some aren’t. The choice of which sites to replicate could be random or guided by some covariate thought to affect detection, and of course this design scheme needs careful thought but … Is there an obvious error in that?

    • Not that I’m aware of. If you are happy to assume the same detection process holds across all sites including the one’s you’ve only sampled once, then you can do this. In our work on optimizing survey designs we assumed that all sites received the same number of visits, but that was for convenience; it reduced the size of the design space we were searching. More directly to Bryan’s point, before you can optimize a design you need to choose an optimization criterion, minimize bias or variance of the estimator, or the ability to detect a trend as we did:

  8. Pingback: Detection probability survey results | Dynamic Ecology

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s