They’re just not that into you: the no-excuses truth to understanding proposal reviews (guest post)

Note from Jeremy: This is a guest post from Peter Adler.

*********************************

We’ve all been outraged by the nit-picky, irrelevant, or downright wrong-headed criticisms that show up in proposal reviews. How could they have rejected the proposal because they didn’t like the number of replicates? Or the details of the statistical model? Or because you didn’t cite the right references? Or even worse, because they didn’t read closely enough to see that you had cited those references? The answer may be that they didn’t reject your proposal for those stupid reasons. They rejected it because they just weren’t that into it.

While some reviewers are honest and self-aware enough to say that, and perceptive enough to explain why, most reviewers want to back up a negative review with a long list of specific, seemingly objective, criticisms. Sometimes they may not even be fully aware that they are just not that into your proposal. I know that if I’m simply not excited by the core ideas of a proposal I am reviewing, I am more likely to get annoyed by those stupid details. In contrast, if the ideas really grab me, I will be very forgiving of the details.

In order to revise and improve your rejected proposal, it is critical to recognize when the reviewers just weren’t that into your original submission. When that is the case, you need to ignore the details of their review and focus all your energy on making the ideas and the pitch more compelling. Altering details of the experimental design or citing additional references won’t make a difference the next time around. In fact, spending your time chasing after those details means less time and effort addressing the fundamental issue: how to better communicate the novelty, insight, and importance of the work.

The challenge is distinguishing the “just not into it” criticisms from legitimate, well-justified concerns. This is easiest when you know the reviewers are making a valid point. Right now I am conducting some pilot experiments because my last NSF proposal got hammered for not having enough (ok, any) preliminary data. While demands for preliminary data can be a “just not into it” complaint, we were proposing ambitious and expensive work in a new system, and the reviewers’ reaction was understandable. Recognizing the valid criticisms is also easy when many reviewers converge on the same point. Beyond that, it gets trickier. If a reviewer gives strong positive feedback on the conceptual goals and shows real understanding of the problem you are tackling, that may indicate that they “get it” and you should take to heart any accompanying negative feedback.

Could this same advice apply to interpreting manuscript reviews as well? My gut reaction is NO, but I’m not sure how well I can articulate why I feel that way. A practical reason is that revised manuscripts often will be returned to the original reviewers. Obviously, ignoring their comments is a bad strategy. But I think the main reason I see manuscript reviews so differently is that reviewers don’t have to be “so into it” to give a paper a positive review. Papers don’t require reviewers to exercise much imagination: the data has been collected, the analyses are complete, and the story, for better or worse, has an ending. It’s a relatively objective situation for a reviewer. The uncertainty and open-ended nature of proposals makes reviewing them a much different experience. To get funded, you must convince reviewers that your uncollected, unanalyzed data will lead to a story with a better ending than your competitors’ unfinished stories. You are asking the reviewer to imagine a rosy future based on limited information and then commit to your proposal over the others. As the word “proposal” implies, it is a form of courtship, which is why pop culture relationship advice is relevant.

p.s. First, apologies to author Greg Behrendt. Second, if I’m making any implicit claim to authority on proposal writing, it comes much more from my experience as a reviewer and panelist than my very mixed record of success in getting proposal funded.

Watch Jeremy’s talk on blogging at Utah State University

Earlier this month I gave a talk on blogging at Utah State University. A video of the talk is now online.

The lighting and sound are a bit rocky right at the beginning, but they’re great after that. It’s a really nicely produced video, thanks very much to the folks at Utah State for producing it and putting it up.

There’s a video of the research talk I gave as well, if you really can’t get enough of me saying “um”. (I need to work on that…)

Friday links: climate change (for women and the planet), yeast mail, grumpy frog, and more (UPDATED)

Also this week: questioning the evidence for p-hacking, hamster wheel desks, are academics becoming more selfish, new faculty advice, resources for modelers, 35 years of “Spandrels”, zombie ideas in other fields, making nerd fury work for you, and more.

From Meg:

The obvious next step for my treadmill desk: a hamster wheel desk. (Jeremy adds: So Meg, is the next step after that a habitrail?)

I really enjoyed this profile of space scientist Maggie Aderin-Pocock. Her story is really inspiring, including her current work to inspire the next generation of scientists.

Here’s a new article in the Journal of the American Medical Association on the impacts of global climate change on human health. There was also an editorial associated with the article, which includes this statement, “Today, in the early part of the 21st century, it is critical to recognize that climate change poses the same threat to health as the lack of sanitation, clean water, and pollution did in the early 20th century.” (ht: Seth Mnookin)

Melissa Wilson-Sayres had a post this week with good advice for pre-tenure faculty.

I enjoyed this post from Jacquelyn Gill on how we all need to work to change the climate for women, instead of just talking about the problems with the climate. My small step of the morning was to send an email trying to get the ball rolling to use the Clancy et al. study as a justification for including training related to sexual harassment into the research ethics course.

This week, the Royal Society announced it’s 43 new University Research Fellows. Only 2 are women. According to the Royal Society:

This year women accounted for 19% of applications for the URF scheme but only accounted for 13% of those shortlisted, 9% of those interviewed and less than 5% of those awarded. Last year women accounted for roughly 20% at all stages of the process.

From Jeremy:

I’ve linked in the past to text-mining studies looking at the distribution of published p-values and finding that in some fields there’s an excess of barely-significant p-values, suggesting p-hacking. Turns out that at least some of those results may well be artifacts of how authors report p-values (e.g., reporting p<0.05 rather than an exact value). See the linked post for a lengthy discussion of proper “p-curve” construction and analysis, and discussion of how to test for publication bias more broadly.

Yeast mail. I’m not kidding. (ht Yoav Ram, via Twitter)

Amy Parachnowitsch wonders if competition for jobs and grants is causing academics and their students to just put their heads down, beaver away on their own work, and focus too much on the things that are of obvious direct benefit to their own careers. Possibly to the detriment of their own careers as well as to others around them and the field as a whole.

I enjoyed biomechanicist John Hutchinson marking the 35th anniversary of “Spandrels”, and not just because he was kind enough to link to my own post on the topic.

Zombie ideas, humanities and social sciences edition: Harry Brighthouse wonders when philosophers, psychologists, and sociologists will quit teaching students the standard–and in his view discredited–versions of Milgram’s obedience experiment and the Kitty Genovese case. In the comments there’s debate about what aspects of these cases have been discredited.

Why planned Federal college ratings won’t prompt most US colleges and universities to cut tuition.

MIT used a lot of pre- and post-testing to show that one of their MOOCs was as or more effective than the classroom based version, and that it was equally effective for all students regardless of their background preparation. The link goes to a press release and I haven’t seen the actual paper. And since students weren’t randomly assigned to a regular classroom vs. a MOOC you need to be careful about what inferences you draw here. But if it’s true it would go against other things I’ve read about MOOCs.

A bit outside our usual territory: lots of discussion in the economics blogosphere this week on two major reports arguing that the net economic costs of fighting climate change would be small or even zero. The devil is in the details with these sorts of reports, because alternative assumptions about all sorts of technical matters make a huge difference to one’s final estimate of the economic costs. To help get you up to speed and give you some sense of the range of informed opinion (and it’s notable that views on these reports aren’t tightly correlated with people’s political views), here are comments from Paul Krugman, Peter Dorman, Robert Stavins, and John Quiggin. (UPDATE: and one more from Tyler Cowen) I’m no expert, so please do share links to other commentaries in the comments. (ht Economist’s View)

Nothing to do with ecology, but I have a personal bias so I’ll share it: Ethan Zuckerman’s lovely convocation address to his (and my) alma mater, on the importance of expanding your horizons.

How to do it all. Your mileage may vary, of course. And note that following some of these suggestions will lead to the sort of behavior that Amy Parachnowitsch bemoans in the link above. (ht Marginal Revolution)

Theoretical ecologist Marissa Baskett maintains a nice list of resources for ecological modeling. Includes programming advice, classic papers on the philosophy of modeling, links to summer programs for grad students, and more.

How to make online nerd fury work for you. The linked post is about using it to find places to eat in new cities. But it’s fun to think about using this approach to get advice on any matter on which lots of people consider themselves “experts” and have strongly-held views. Maybe I should use this approach to figure out the stats for my next experiment. :-) (ht Marginal Revolution)

So what’s an “individual” organism, anyway? How do you differentiate one from another? Evolutionary biologist Charles Goodnight is mulling it over in a series of posts. Starts here.

This week in back of the envelope calculations: all of the ants are not as heavy as all of the humans.

And finally, grumpy cat frog. :-)

Detection probability survey results

Last week, I highlighted some new results from a paper on detection probabilities and placed detection probabilities in the context of estimator theory. This in turn led to a a reader poll to try to get a sense of how people thought about experimental design with detection issues.

Although I don’t want to spend too much time on it here, I wanted to briefly highlight a great paper that just came out “Assessing the utility of statistical adjustments for imperfect detection in tropical conservation science” by Cristina Banks-Leite and colleagues. They look at several real world scenarios focused on identifying covariates of occupancy (rather than absolute occupancy levels) and show the results are not much different with or without statistical adjustment. They draw a distinction between a priori control for covariates of detection probability in setting up a good study design vs a posteriori statistical control for detection probability and point out that both are valid ways of dealing with detection issues. The take home quote for me was “We do not believe that hard-won field data, often on rare specialist species, should be uniformly discarded to accord with statistical models”. Whereas my last post was very theoretical/statistical this paper is very grounded in real-world, on-the ground conservation, but in many ways makes many of the same points. It is definitely worth a read.

Turning now to the survey … at the time of analysis Wednesday morning there were 168 respondents. You can view the raw results here. There was a reasonably good cross section of career stages and organisms represented although the employment sector skewed very heavily to university. And of course “readers of a blog who chose to respond to a poll” is in no way a scientifically designed sample. If I had to speculate this particular post attracted a lot of people interested in detection probabilities, but what exact bias that would result in is hard to predict.

Recall I presented two scenarios. Scenario A was to visit 150 sites once. Scenario B was to visit 50 sites 3 times each. The goal was to estimate how occupancy varied with four collinear environmental variables.

Probably the lead result is the recommended scenario:

detprob_advice

Scenario B (50 sites 3 times) was the most common recommendation but it by no means dominated. Over 10% went for scenario A outright. And 20% noted that choosing required more information – with most people saying the critical information was more knowledge about the species – well represented in this quote on what the choice would depend on: “A priori expectation of potential for detection bias, based on species biology and survey method.”. It should be noted that a non-trivial fraction of those who went for B did it not to support detection probabilities but for reasons of sampling across temporal variability (a goal that is contradictory with detection probability modelling which assumes constant conditions and even constant individuals across the repeat visits). 17% also went for B but with hesitation (either putting statistical expertise of others over their own field intuition or else feeling it was necessary to publish).

There was a trend (but definitely not statistically significant) for more graduate students to recommend B and more senior career people (while still favoring B) to switch to “it depends”. Similarly there was a non-significant trend for people who worked on vertebrates to favor B and for people who worked on plants and inverts to switch a bit to scenario A (with scenario B still a majority).

Quite a few people argued for a mixed strategy. One suggestion was to visit 100 sites with 2 repeat visits to 25 of them. Another suggested visiting 25 sites 3 times, then making a decision how to proceed. And there were quite a few variations along this line.

The story for my question about whether there was pressure or political correctness to use detection probabilities was similar (not surprisingly). There was a weak trend to yes (mean score of 3.09) but not significant (p=0.24). Graduate students were the most likely to think there was PC-ness and senior career people the least likely. People working in verts and plants were more likely to see PC-ness than people working on inverts (again all non-significant).

So the overall pattern is a lean to scenario B but a lot of diversity, complexity and nuance. And not much if any perception of PC-ness around having to use detection probabilities ON AVERAGE (some individuals felt rather strongly about this in both directions).

In short, I think a majority of respondents would have agreed with this quote from one respondent:  “… the most important part of study design is…thinking. Each situation is different and needs to be addressed as a unique challenge that may or may not require approaches that differ from those used in similar studies.” Which nicely echoes the emphasis in this blog on the need to think and not just apply black and white universal rules for statistics and study design.

What belongs in the appendices vs. the main text in scientific papers?

So, how do you decide what material to include in the main text of your papers, vs. in appendices?

Some things are easy. Raw data, code, and lengthy derivations belong in appendices.* Alternative ways of running the analysis that lead to the same conclusions belong in an appendix. And these days some journals increasingly only want a summary of your methods in the main text, with the details relegated to appendices.

But what about tougher calls? For instance, you might say that the main text is where you tell the “main” story of the work and present “key” results and analyses, with mere “supporting details” relegated to appendices. I’d say that myself. But if you take that approach to an extreme, it basically says that your main text is your abstract (plus maybe a couple of figures), and everything else is mere supporting detail! I have the impression that Nature and Science papers have kind of gone down this road. I’m old enough to remember a time when Nature and Science papers really were quite different beasts, in both form and content. A good Nature or Science paper told a clear, deep story in a very compact, incisive way. Nowadays, it seems like the main text of many Science and Nature papers is like an extended abstract of an ordinary paper, with the rest of the paper buried in lengthy appendices.

This isn’t just an issue in where to draw the line between main text and appendices. It’s also about which material goes on which side of the line. A result that one person thinks is a mere supporting detail may be viewed by someone else as the most important and interesting result in the paper. Now, you could argue that’s no big deal. After all, the main text and appendices are all there, so anyone who’s more interested in Appendix 27 than in the main text can just read accordingly. But I’m not so sure I buy that. We present our work in narratively-structured chunks–called “papers”–for a reason. As readers, we want and need authors to tell us what to focus on, not to just give us an unstuctured stream of material and say “figure it out for yourselves”. As a reader, I’m trusting you as an author to make a good professional judgement about what the story is. I might not necessarily agree with your judgement, of course, though ordinarily I probably will. I need to trust your judgement because frankly, while in principle I could read all your appendices, in practice hardly anyone (even specialists on the topic of your paper) is ever going to do so.

So, how do you decide what goes in the main text vs. appendices? Have you ever had a decision you struggled with? And does anyone share my sense that the advent of online appendices isn’t just allowing people to report lots of supporting detail that otherwise would’ve gone unreported, but is actually changing how we write our papers in more subtle ways?

p.s. A recent post of Brian’s discussed a case in which he found an appendix of a paper very interesting and important and so wondered why it wasn’t discussed at greater length than the main text. My post was actually in the works before that, so the timing is fortuitous. And that particular case was already discussed in the comment thread on Brian’s post and I don’t want to reopen that discussion. My interest here is in the general issue I’ve raised, not in debating specific examples.

*In ecology, of course–if it’s a math paper the derivation is the paper.

Theory vs. models in ecology (UPDATED)

Katie Koelle delivered the opening talk in the Ignite session on “theory vs. empiricism” at the ESA meeting.* I thought she raised several interesting issues that weren’t really touched on in the rest of the session. I was struck by one remark in particular: that theory in ecology is dying, or at least going out of fashion, and is being replaced by modeling.

Theory here means trying to discover or derive general principles or laws–the fundamental simplicity underlying and unifying the apparent polyglot complexity of nature. Think of evolution by natural selection, the laws of thermodynamics, general relativity, MaxEnt, and statistical “attractors” like the central limit theorem and extreme value theory.

In contrast, modeling here means building a mathematical description of some specific system, in order to explain or predict some aspect of that system. The model need not include every detail about that specific system, but it is tailored to that system. So there’s no hope or expectation that it will explain or predict any other system (though importantly, there could still be commonalities or analogies with other systems). Think of global climate change models, or models of various cycling species, or Meg’s award winning work on host-parasite dynamics in Daphnia.**

If you want to personify the contrast: John Harte is a theoretician. Tony Ives is a modeler.

Hopefully it goes without saying that both theory and models are hugely valuable in science (indeed, both John and Tony note this in the links above). But there’s much more that can be said about the distinction (and I do think it’s a real distinction, or at least two ends of a continuum). Here are my thoughts (strap in, there’s a lot of them!):

  • I think Katia and Karen are right that modeling is the hot thing right now in ecology, while theory’s not, except for a small number of theories that are hot because it’s possible to treat them like models in the sense that you can fit them to data (e.g., MaxEnt). I think John Harte gets at one big reason why in the interview linked to above: advances in software and computing power mean that it’s now easier than ever to do simulate complicated, analytically-intractable models, and to fit those models to data using computationally-intensive statistical approaches. Water flows downhill, following the path of least resistance, and so does science. If X becomes easier to do than it used to be, people are going to do more of X. Which is a good thing, at least up to a point. I mean, if there was something we wanted to do more of, but couldn’t because it wasn’t technically feasible, then surely we ought to do more of it once it becomes technically feasible! The danger, of course, is that people start doing X just because it’s easy (never mind if it’s the right thing to do), or because it’s what everyone else is doing (a bandwagon). There’s a thin line between hammering nails because you’ve just been given a hammer, and thinking everything is a nail because you’ve just been given a hammer (or thinking that, because you’ve just been given a hammer, the only thing worth doing is hammering nails). There’s an analogy here to adaptive evolution. The direction in which a population evolves under natural selection depends both on the direction of selection, and on the genetic variance-covariance matrix. The “direction of selection” in science is what we’d do if we were unconstrained by technology, time, money, or effort. The “genetic variance-covariance matrix” is the constraints that define the paths of least resistance, and the intractable dead ends. The art of doing science well is figuring out the optimal “direction of evolution”, balancing what we’d like to do and what’s easiest to do.
  • I think the trend away from theory and towards modeling in ecology is a long-term trend. See for instance this essay from the early ’90s from Jim Brown, arguing for the continuing value of theory (well, maybe; more on that in a second), and the response from Peter Kareiva, arguing that ecologists need to get away from general theories and move towards system-specific modeling. I think Kareiva’s point of view is winning. As evidence for this, recall that in recent decades, the most cited papers in ecology have not been theory papers, in contrast to earlier decades.
  • That Kareiva essay gets at another reason why I think modeling is ascendant over theory in ecology: theory often is hard to test. It’s not merely that lots of different theories tend to predict the same patterns, so that those patterns don’t really provide a severe test of any of the theories, although that’s often part of it. It’s also that, because theories aren’t system-specific, they’re often hard to link to data from any specific system (and all data come from some specific system or systems). How do you tell the difference between a theory that “captures the essence” of what’s going on but yet doesn’t match the data well because it omits “inessential” details, and a theory that’s just wrong? The link between theory and data (as opposed to model and data) often involves a lot of hand-waving. And while I do think there’s such a thing as good hand-waving, so that “good hand wavers” are better at testing theory than bad hand wavers, I admit I can’t really characterize “good hand waving” except to say that I think I know it when I see it.
  • If the previous two bullets are right, then that means ecologists are getting over Robert MacArthur. That is, they’re getting away from doing the sort of theory MacArthur did, and trying to test theory in the way that MacArthur did (e.g., by looking for a fuzzy match between qualitative theoretical predictions and noisy observational data). On balance, and with no disrespect at all to MacArthur (a giant who helped invent ecology as a professional science), I think that’s progress. But I’m not sure. Maybe it’s progress in some respects, but retrogression in other respects, with the net result being difficult or impossible to calculate? Brian for one seems to have mixed feelings. On the one hand, he has called for mathematical descriptions of nature to start “earning their keep” more than they have (e.g., by making bold, quantitative predictions that are testable with data). Which would seem to be a call for more models and less theory. But on the other hand, he’s also lamented that ecologists seem to be running out of big theoretical ideas. And Morgan Ernest has expressed mixed feelings about how we’re becoming more rigorous but less creative, better at answering questions but less good at identifying questions worth answering.
  • As Tony Ives notes in the interview linked to above, being a modeler as opposed to a theoretician doesn’t mean just becoming a mathematical stamp collector and giving up on the search for generalities. Because there often are analogies and similarities between apparently-different systems. One way to model a specific system is to recognize the ways in which that system is analogous to other systems. See this old post for further discussion, and this excellent piece for a discussion in a related context.
  • It’s tempting to think that the divide between theory and models might have cultural roots, much as the divide between theory and empiricism ultimately is cultural. Perhaps it reflects a cultural divide among mathematicians between theory builders and problem solvers.*** Maybe theoreticians in ecology are really mathematicians or physicists at heart, while modelers are biologists or engineers at heart. Maybe theoreticians care about simplicity and elegance, while modelers revel in complexity. Maybe theoreticians care about fundamental questions while modelers care about practical applications. But I’m not sure. For instance, in that interview linked to above, theoretician John Harte talks about the value of theory (as opposed to models) for conservation, and for getting policy makers to take ecologists seriously. He also talks about how important it is to him to do field work and to get out in nature. Conversely, Ben Bolker is a modeler rather than a theoretician, but in describing his own motivations he talks about loving the ideas of physics and mathematics and being only loosely anchored in the natural history of particular systems. So I’m not sure that the divide here is a cultural one; it might be more of a personal, different-strokes-for-different-folks thing. And in any case I hope it’s not cultural, since cultural divides are pretty intractable and tend to give rise to mutual misunderstanding and incomprehension.
  • That linked piece from the previous bullet on the two cultures of mathematicians suggests that there are areas of mathematics where you need theory to get anywhere, and others where you need modeling to get anywhere. That’s a fascinating suggestion to me–do you think the same is true in ecology? For instance, to use John Harte and Tony Ives as examples again, maybe you need theory to make headway in macroecology, as John Harte has been doing in his MaxEnt work? While maybe you need modeling to make headway on population dynamics, as Tony Ives has been doing?
  • The difference between theories and models isn’t always clear. For instance, is the “metabolic theory of ecology” a theory? I’m honestly not sure. The core of it–West et al. 1997–looks like a model to me. For instance, it’s got a pretty large number of parameters, and it’s got different simplifying assumptions tailored to circulatory systems that have, or lack, pulsatile flow. Ecologists refer to the “theory” of island biogeography–but isn’t that really just a very simplified model of colonization and extinction on islands? The same way the Lotka-Volterra predator-prey “model” is a very simplified model of predator-prey dynamics? Maybe theory and models are more like two ends of a continuum? The more simplifying assumptions you make, and the less tailored your assumptions are to any particular system, the closer you are to the theory end of the continuum?
  • One can talk about subtypes of theory and models too. For instance, Levins (1966) famously suggested a three-way trade-off between realism, precision, and generality in modeling. Models that sacrifice generality for precision and realism are what I’m calling “models”. While models that sacrifice precision for realism and generality, and models that sacrifice realism for precision and generality, are different subtypes of what I’m calling “theory”.
  • Some applications of mathematics in ecology kind of fall outside the theory-model dichotomy (or theory-model continuum). I’m thinking for instance of partitions like the Price equation, or Peter Chesson’s approach to coexistence theory. They aren’t models or theories themselves. Rather, they tell you something about the properties that any model or theory will have (e.g., any model or theory of stable coexistence will operate via equalizing mechanisms and stabilizing mechanisms).
  • I’m curious how aware empirically-oriented ecologists are of the theory-model distinction. And how their awareness of it, or lack thereof, affects their attitudes towards mathematical approaches generally.
  • As a grad student, I got into microcosms because that seemed like a system in which theories were models, or at least close to being models. That is, the drastic simplifying assumptions of the theories in which I was interested (“community modules”, as Bob Holt calls them) were closer to being met in microcosms than in most other systems. So that theories could be tested in a rigorous way, much as system-specific models are tested. But I’ve found myself increasingly getting away from that, and wanting to build models for microcosms. And more broadly, I’ve found myself becoming more excited about the Tony Ives approach of using models tightly linked to data to solve system-specific puzzles. I think that many of the most impressive successes in ecology over the last couple of decades have come from that approach. Even if you’re interested in general theories (and I still am), increasingly I feel like bringing data to bear on those theories is best done by bringing data to bear on models that incorporate theoretical ideas.
  • After I wrote this post, I was alerted to a new paper on theory in ecology that covers much of the same ground. It’s very interesting, looks like good fodder for a future post.

*On behalf of Karen Abbott, who couldn’t make it. UPDATE: Marm Kilpatrick and Kevin Gross also contributed a lot to the intro talk.

**Yes, I know others have defined “theory” and “model” differently. Which is why I defined my own usage for purposes of this post.

***A theory builder being someone like David Hilbert, as opposed to a problem solver like Paul Erdös.

Friday links: sexual assault, NSF preproposals, is lecturing ethical, and more

Jeremy is traveling this week, so I’m in charge of the Friday link fest!

From Jeremy:
KELP :-) (ht imachordata.com)

From Meg:
Hope Jahren had a very powerful op-ed in the NYTimes about science’s sexual assault problem. It’s a must read. I agree with this tweet from Anne Jefferson (which was in response to Hope’s piece):

Terry McGlynn is clearly participating in and amplifying the conversation. He had a great post in response to Hope’s op-ed in which he thinks about what we can do, especially those of us who are PIs. We need to have this discussion.

Joan Strassmann had a post on why NSF preproposals are a failed idea. Her proposed solution? One full proposal cycle per year. On the same topic, there is this new BioScience paper by Leslie Rissler and John Adamec, which reports that 49% of respondents to a survey done by NSF were satisfied or very satisfied with the switch to pre-proposals; 20% of respondents were neutral. 80% of respondents were dissatisfied or very dissatisfied with the switch to one submission per year. 69% of respondents thought the changes would harm faculty without tenure.

I love this animation from the NYTimes on seeing the invisible. It focuses on Antonie van Leeuwenhoek and all the cool little critters that were first seen by him. It even includes dancing Daphnia! (ht: @LKluber)

Karen Lips had a post on how scientists can participate in the policy process (beyond simply saying “more data are needed). Karen’s post reminded me that I had been meaning to link to this post from Josh Tewksbury on his transition from a traditional academic career to his current position as the director of the WWF’s Luc Hoffmann Institute, which includes advice for new PhDs and others who want to work at NGOs.

The Bearded Lady Project aims to highlight the work of female paleontologists and the challenges they face. They say:

The Bearded Lady Project: Challenging the Face of Science’s mission is twofold. First, to celebrate the inspirational and adventurous women who choose to dedicate their lives in the search of clues to the history of life on earth. And second, to educate the public on the inequities and prejudices that exist in the field of science, with special emphasis on the geosciences.

The vision for this project is to complete a short live-action documentary as well as to develop and display a touring portrait series. We hope our film and portrait series will inspire young women to pursue a career in geoscience. In an effort to do all that we can, both film and portrait series will dedicate their proceeds to a scholarship fund to support future female scientists.

Male scientists want to be involved dads but few are (ht: @phylogenomics via @ctitusbrown)

Melissa Wilson-Sayres live-tweeted a seminar by Scott Freeman that asked whether lecturing is ethical. The storification of her tweets is here, and includes this tweet

Finally, sciwo had a post at Tenure, She Wrote on being a mid-career academic. I am also trying to adjust my mindset to being “mid-career” now – it’s a little weird! But I think I will try take sciwo’s advice to “embrace my status as a mid-career woman and to own the idea that younger colleagues, especially women, will see me as a mentor.”

Detection probabilities – back to the big picture … and a poll

I have now had two posts (both rather heavily read and rather contentiously debated in the comments) on detection probabilities (first post, second post). Whether you have or haven’t read those posts, they were fairly technical (although my goal was to explain technical issues in an accessible way).

Here I want to pull way back up to 10,000 feet and think about the boots on the ground implications. And for a change of pace, I’m not going to argue a viewpoint. I just am going to present a scenario (one I see every semester, one that I know students all over the world face from conversations when I travel) and ask readers via a poll what they would advise this student.

So you are on the committee of a graduate student. This student’s project is to study the species Debatus interminus which may be a candidate for threatened listing (little is really known). The primary goals are: 1) to assess overall occupancy levels of D. interminus and 2) to figure out how occupancy varies with four variables (vegetation height, canopy closure, soil moisture, and presence of its one known predator, Thinking clearus). Obviously these four variables are moderately collinear. Given resources, length of project, accessibility of sites, that the student is the only person able to visit the sites, etc you calculate the student can do exactly 150 visits. Various members of the committee have advised the student that she/he should:

  • Scenario A – identify 150 sites across the landscape and visit each site 1 time, then estimate ψ (occupancy), and do a simple logistic regression to give β, a vector of regression coefficients  for how ψ varies with your four variables across 150 sites.
  • Scenario B – identify 50 sites across the landscape and visit each site 3 times, then develop a simple hierarchical model of detection proabilities so you will estimate ψ (occupancy), p (detection probability), and β, a vector of regression coefficients in a logistic regression for how ψ varies with your four variables at 50 sites.

Would you advise the student to follow scenario A or B? And why? Please take our poll (should take less than 5 minutes). I am really curious what our readership will say (and I care about this poll enough that I’ve taken the time to do it in Google polls so I can cross tab the answers with basic demographics – but don’t worry your anonymity is ensured!)

Depending on level of interest I’ll either post the results in the comments or as a separate post after a few days.

And – as everybody knows – a poll in a blog is not a scientific sample, but it can still be interesting.

Listen to Jeremy talk about ecology and blogging on Utah Public Radio Wed. 9 am Mountain Time (UPDATED)

I’m in Utah right now giving a couple of talks at Utah State University. Which led to me getting a very flattering invitation to tape an interview for Utah public radio. The interview will be broadcast on Wed. Sept. 16 at 9 am Mountain Time. It should be available online around the same time at the above link, under the Programs tab (either the Access Utah program, or the Science Questions program, not sure which). UPDATE: It’s Access Utah, here’s the link to the show. The interview with me is the first half of the show.)

In the interview, I talk about how I got into ecology, microcosms, spatial synchrony, my scientific role models, the biggest recent advances in ecology, and the blog. See if you can guess which questions I’d prepared an answer for in advance, and which ones caught me out and forced me to stall for time.* :-)

This was my first radio interview (indeed, my first interview for any non-ecologist audience), so it was a new experience. I tried to draw on some long ago, vaguely-remembered media training I had, and also on my limited experience explaining myself in other venues (e.g., I reminded myself of my elevator pitch). And I listened to a previous podcast from the show. My main goal was to avoid jargon and try to keep things non-technical.

I’m sure many of you have much more training and experience at this sort of thing than I have. So if you do decide to give it a listen, I’d love any feedback you have, especially what I could do better. I think it went pretty well, but I’m not the best judge, and I’m sure there’s room for improvement.

*This should not be difficult.

Detection probabilities, statistical machismo, and estimator theory

Detection probabilities are a statistical method using repeated sampling of the same site combined with hierarchical statistical models to estimate the true occupancy of a site*. See here for a detailed explanation including formulas.

Statistical machismo, as I define it in this blog, is the pushing of complex statistical methods (e.g. reviewers requiring the use of a method, authors claiming their paper is better solely because of the use of a complex method) when the gains are small or even occur at some cost. By the way, the opposite of statistical machismo is an inclusive approach that recognizes every method has trade-offs and there is no such thing as a best statistical method.

This post is a fairly technical statistical discussion .If you’re interested in detection probabilities but don’t want to follow the details, skip to the last section for my summary recommendations.

Background

I have claimed in the past that I think there is a lot of statistical machismo around detection probabilities these days. I cited some examples from my own experience where reviewers insisted that detection probabilities be used on data sets that had high value in their spatial and temporal coverage but for which detection probabilities were not possible (and even in some cases when I wasn’t even interested in occupancy). I also discussed a paper by Welsh, Lindenmayer and Donnelly (or WLD) which used simulations to show limitations of detection probability methods in estimating occupancy (clearly driven by their own frustrations of being on the receiving end of statistical machismo for their own ecological papers).

In July the detection probability proponents fired back at WLD with a rebuttal paper By Guillero-Arroita and four coauthors (hereafter GLMWM). Several people have asked me what I think about this paper including some comments on my earlier blog post (I think usually in the same way one approaches a Red Sox fan and asks them about the Yankees – mostly hoping for an entertaining reaction).

The original WLD paper basically claimed that in a number of real world scenarios, just ignoring detection probabilities gave a better estimator of occupancy. Three real-world scenarios they invoked were: a) when the software had a hard time finding the best fit detection probability model, b) a scenario with moderate occupancy (Ψ=40%) and moderate detection probabilities (about p=50%), and c) a scenario where detection probabilities depend on abundance (which they obviously do). In each of these cases they showed, using Mean Squared Error (or MSE, see here for a definition), that using simple logistic regression only of occupancy ignoring detection probabilities had better behavior (lower MSE).

GLMWM basically pick different scenarios (higher occupancy Ψ=80%, lower detection p=20% and a different SAD for abundances) and show that detection probability models have a lower MSE. They also argue extensively that software problems finding best fits are not that big a problem**. This is not really a deeply informative debate. It is basically,” I can find a case where your method sucks. Oh yeah, well, I can find a case where your method sucks.”

Trying to make sense of the opposing views

But I do think  stepping back, thinking a little deeper, framing this debate in the appropriate technical context – the concept of estimation theory, and pulling out a really great appendix in GLMWM that unfortunately barely got addressed in their main paper, a lot of progress can be made.

First, lets think about the two cases where each works well. Ignoring detection worked well when detection probability, p, was high (50%). It worked poorly when p was very low (20%). This is just not surprising. When detection is good you can ignore it, when it is bad you err to ignore it! Now WLD did go a little further, they didn’t just say that you can get away with ignoring detection probability at a high p – they actually showed you get a better result than if you don’t ignore it. That might at first glance seem a bit surprising – surely the more complex model should do better? Well, actually no. The big problem with the detection probability model is identifability – separating out occupancy from detection. What one actually observes is Ψ*p (i.e. that % of sites will have an observed individual). So how do you go from observing Ψ*p to estimating Ψ (and p in the case of the detection model)? Well ignoring p is just the same as taking  Ψ*p as your estimate. I’ll return to the issues with this in a minute. But in the detection probability model you are trying to disentangle Ψ vs. p just from observed % of sites with very little additional information (the fact that observations are repeated on a site). Without this additional information  Ψ*p are completely unseparable – you cannot do better than randomly pick some combination of  Ψ and p and that together multiple to give the % of sites observed (and again the non-detection model essentially does this by assuming p=1 so it will be really wrong when p=0.2 but only a bit wrong p=0.8). The problem for the detection model is that if you only have two or three repeat observations at a site and p is high, then most sites where the species is actually present it will show up at  all two or three observations (and of course not at all when it is not present). So you will end up with observations of mostly 0/0/0 or 1/1/1 at a given site. This does not help differentiate (identify)  Ψ from p at all. Thus it is actually completely predictable that detection models shine when p is low and ignoring detection shines when p is high.

Now what to make of the fact, something that GLMWM make much of, that just using Ψ*p as an estimate for Ψ is always wrong anytime p<1. Well, they are correct about it always being wrong. In fact using the observed % of sites present (Ψ*p) as an estimator for Ψ is wrong in a specific way known as bias. Ψ*p is a biased estimator of Ψ. Recall that bias is when the estimate consistently overshoots or undershoots the true answer. Here Ψ*p consistently undershoots the real answer by a very precise amount Ψ*(1-p)  (so by 0.2 when Ψ=40%  and p=50%). Surely this must be a fatal flaw to intentionally choose an approach that you know on average is always wrong? Actually, no, it is well known in statistics that sometimes biased estimator are the best estimator (by criteria like MSE).

Estimation theory

Pay attention here – this is the pivotal point – a good estimator has two properties – it’s on average close to right (low bias), and the spread of its guesses (i.e. the variance of the estimate over many different samples of the data) is small (low variance). And in most real world examples there is a tradeoff between bias and variance! More accurate on average (less bias) means more variance in the guesses (more variance)!  In a few special cases you can pick an estimator that has both the lowest bias and the lowest variance. But anytime there is a trade-off you have to look at the nature of the trade-off to minimize MSE (best overall estimator by at least one criteria). (Since  Mean Squared Error or MSE=Bias^2+Variance one can actually minimize MSE if one knows the trade-off between bias and variance).This is the bias/variance trade-off to a statistician (Jeremy has given Friday links to posts on this topic by Gelman).

Bias and Variance - Here estimator A is biased (average guess is off the true value) but it has low variance. Estimator B has zero bias (average guess is exactly on the true value) but the variance is larger. In such a case Estimator B can (and in this example does) have a larger Mean Squared Error (MSE) - a metric of overall goodness of an estimator.

Figure 1 – Bias and Variance - Here estimator A is biased (average guess is off the true value) but it has low variance. Estimator B has zero bias (average guess is exactly on the true value) but the variance is larger. In such a case Estimator B can (and in this example does) have a larger Mean Squared Error (MSE) – a metric of overall goodness of an estimator. This can happen because MSE depends on both bias and variance – specifically MSE=Bias^2+Variance.

This is exactly why the WLD ignore detection probabilities method (which GLMWM somewhat disparagingly call the naive method) can have a lower Mean Square Error (MSE) than using detection probabilities despite always being biased (starting from behind if you will). Detection probabilities have zero bias and non-detection methods have bias, but in some scenarios, non-detection methods have so much lower variance than detection methods that the overall MSE is better to ignore the detection method. Not so naive after all! Or in other words, being unbiased isn’t everything. Having low variance (known in statistics as an efficient estimator) is also important. Both the bias of ignoring detection probabilities (labelled “naive” by GLMWM) and the higher variances of the detection methods can easily be seen in Figures 2 and 3 of GLMWM.

When does ignoring detection probabilities give a lower MSE than using them?

OK – so we dove into enough estimation theory to understand that both WLD and GLMWM are correct in the scenarios they chose (and that the authors of both papers were probably smart enough to pick in advance a scenario that would make their side look good). Where does this leave the question most readers will care about most – “should I use detection probabilities or not?”  Well the appendix to GLMWM is actually exceptionally useful (although it would have been more useful if they bothered to discuss it!) – specifically supplemental material tables S2.1 and S2.2.

Let’s start with S2.1. This shows the MSE (remember low is good) of the ignore detection model in the top half and the MSE of the use the deteciton model in the bottom half for different samples sizes S, repeat visits K, and values of Ψ and p. They color code the cases red when ignore beats use detection, and green when detection beats ignore (and no color when they are too close to call). Many of the differences are small, but some are gigantic in either direction (e.g. for Ψ=0.2, p=0.2, ignoring detection has an MSE of 0.025 – a really accurate estimator – while using detection probabilities has an MSE of 0.536 – a really bad estimate given Ψ ranges only from 0-1, but similar discrepancies can be found in the opposite direction too). The first thing to note is that at smaller sample sizes the red, green and no color regions are all pretty equal! IE ignoring or using detection probabilities is a tossup! Flip a coin!  But we can do better than that. When Ψ (occupancy) is < 50% ignore wins, when Ψ>50%, use detection wins, and when p (detection rate) is high, say>60% then it doesn’t matter. In short, the contrasting results between WLD and GLMWM are general! Going a little further, we can see that when sample sizes (S but especially number of repeat visits K) creep up, then using detection probabilities starts to win much more often which also makes sense – more complicated models always win when you have enough data, but don’t necessarily (and here don’t) win when you don’t have enough data.

Bias, Variance and Confidence Intervals

Figure 2 – Figure 1 with confidence intervals added

Now lets look at table S2.2. This is looking at something that we haven’t talked about yet. Namely, most estimators have, for a given set of data, a guess about how much variance they have. This is basically the confidence interval in Figure 2. In Figure 2, Estimator A is a better estimator of the true value (it is biased, but the variance is low so MSE is much lower), but Estimator A is over confident – it reports a confidence interval (estimate of variance) that is much smaller than reality. Estimator B is a worse estimator, but it is at least honest – it has really large variance and it reports a really large confidence interval. Table S2.2 in GLMWM shows that ignoring detection probabilities is often too cocky – the reported confidence intervals are too small (which has nothing to do with and in no way changes that ignoring detection probabilities is in many case still a better or equally good estimator of the mean – the conclusion from table S2.1). But using detection probabilities is just right – not too cocky, not too pessimistic – it’s confidence intervals are very accurate – when there’s a lot of variance, it knows it! In short Figure 2  is a good representation of reality over a large chunk of parameter space where method A is ignore detection (and has lower MSE on the estimate for Ψ but over-confident confidence intervals) and method B is use detection-based methods (and has worse MSE for the estimation of Ψ but has very accurate confidence intervals)..

(As a side-note, this closely parallels the situation for ignoring vs statistically treating spatial, temporal and phylogenetic autocorrelation. In that case both estimators are unbiased . In principal the variance of the methods treating autocorrelation should be lower, although in practice they can have larger variance when bad estimates of autocorrrelation occur so they are both roughly equally good estimators of the regression coefficients. But the methods ignoring autocorrelation are always over-confident – their reported confidence intervals are too small.)

So which is better – a low MSE (metric of how good at guessing the mean) or an honest, not cocky estimator that tells you when its got big error bars? Well in some regions you don’t have to choose  using detection probabilities is a better estimator of the mean by MSE and you get good confidence intervals. But in other regions – especially when Ψ and p are low you have to pick – there is a tradeoff – more honesty gets you worse estimates of the occupancy. Ouch! That’s statistics for you. No easy obvious choice. You have to think! You have to reject statistical machismo!

Summary and recommendations

Let me summarize four facts that emerge across the WLD and GLMWM papers:

  1. Ignoring detection probabilities (sensu WLD) can give an estimate of occupancy that is better (1/3 of parameter space), as good as (1/3 of parameter space) or worse than (1/3 of parameter space) estimates using hierarchical detection probability models in terms of estimating the actual occupancy. Specifically, ignoring detection guarantees bias, but may result in sufficiently reduced variance to give an improved MSE.These results come from well-known proponents of using detection probabilities using a well-known package (unmarked in R), so they’re hard to argue with. More precisely, ignoring detection works best when Ψ is low (<50%) and p is low, using detection works best when Ψ is high (>50%) and p is low, and both work very well (and roughly equally well) when p is high (roughly when p>50% and certainly when p>80%) rgardless of Ψ.
  2. Ignoring detection probabilities leads to overconfidence (reported confidence intervals that are too small) except when p is high (say >70%). This is a statement about confidence intervals. It does not affect the actual point estimate of occupancy which is described by #1 above.
  3. As data size gets very large (e.g. 4-5 repeat visits of 165 sites) detection probability models general get noticeably better – the results in #1 mostly apply at smaller, but in my opinion more typically found, sample sizes (55 sites, 2 repeat visits).

And one thing talked about a lot which we don’t really know yet:

  1. Both WLD and GLMWM talk about whether working with detection probabilities requires larger samples than ignoring detection probabilities. Ignoring detection probabilities allows  Ψ to be estimated with only single visits to a site, while hierarchical detection probabilities requires a minimum of 2 and as GLMWM shows really shines most with 3 or 4 repeat visits. To keep a level playing field both WLD and GLMWM reports results where the non-detection approach uses the repeat visits too (it just makes less use of the information by collapsing all visits into either species seen at least once or never seen). Otherwise you would be comparing a model with more data to a model with less data which isn’t fair. However, nobody has really full evaluated the real trade-off – 50 sites visited 3 times with detection probabilities vs 150 sites visited once with no detection probabilities. And in particular nobody has really visited this in a general way across the whole parameter space  for the real-world case where the interest is not in estimating  Ψ, the occupancy, but the β’s or coefficients in a logistic regression of how Ψ varies with environmental covariates (like vegetation height, food abundance, predator abundance, degree of human impact, etc). My intuition tells me that with 4-5 covariates that are realistically covarying (e.g. correlations of 0.3-0.7) getting 150 independent measures of the covariates will outweigh the benefits of 3 replicates of 50 sites (again especially for accurate estimation of the β’s) but to my knowledge this has never been measured. The question of whether estimating detection probabilities requires more data (site visits) remains unaswered by WLD and GLMWM but badly needs to be answered (hint: free paper idea here).

So with these 3 facts and one fact remaining unknown, what can we say?

  1. Detection probabilities are not an uber method that strictly dominates ignoring them. As first found by WLD and now clearly shown to be general in the appendices of GLMWM, there are fairly large regions of parameter space where the primary focus – the estimate of Ψ – is more accurate if one ignores detection probabilities! This is news the detection probably machismo-ists probably don’t want you to know (which could be an explanation for why  it is never discussed in GLMWM).
  2. Detection probabilities clearly give better estimates of their certainty (or in a lot of cases uncertainty) – i.e. the variance of the estimates.
  3. If you’re designing data collection (i.e. estimating # of sites vs # visits/site before you’ve taken measurements – e.g. visit 150 sites once or 50 sites 3 times), I would recommend something like the following decision tree:
    1. Do you care more about the estimate of error (confidence intervals)  than the error the estimate (accuracy of Ψ)? If yes then use detection probabilities (unless p is high).
    2. If you care more about accuracy of Ψ, do you have a pretty good guess that Ψ much less or much greater than 50% or that p is much greater than 70%? If so then you should use detection probabilities if Ψ is much greater than 50% and p less than or equal to 50-60%, but ignore them if Ψ much less than 50% or p clearly greater than 50-60%.
    3. If you care more about accuracy of Ψ and don’t have a good idea in advance of roughly what Ψ or p will be, then you have really entered a zone of judgement call where you have to weigh the benefits of more sites visited vs. more repeat visits (or hope somebody answers my question #4 above soon!).
    4. And always, always if you’re interested in abundance or species richness, don’t let somebody bully you into switching over to occupancy because of the “superiority” of detection models (which as we’ve seen is not even always superior at occupancy). Both the abundance and species richness fields have other well established methods (e.g. indices of abundance, rarefaction and extrapolation) for dealing with non-detection.
    5. Similarly, if you have a fantastic dataset (e.g. a long term monitoring dataset) set up before detection probabilities became fashionable (i.e. no repeat visits) don’t let the enormous benefits of long term (and perhaps large spatial scale) data get lost just because you can’t use detection probabilities. As we’ve seen detection probabilities are (a good method, but also a flawed method which is clearly outperformed in some cases just like every other method in statistics. They are not so perfect that they mandate throwing away good data.

The debate over detection probabilities have generated a lot more heat and smoke than light, and there are clearly some very machismo types out there, but I feel like if you read carefully between the lines and into the appendices, we have learned some things about when to use detection probabilities and when not to. The question #4 still remains a major open question just begging for a truly balanced, even-handed assessment. What do you think? Do you use detection probabilities in your work? Do you use them because you think they’re a good idea or because you fear you can’t get your paper published without them? Has your opinion changed with this blog?

 


*I’m aware there are other kinds of detection probabilities (e.g. distance based) and that what I’m really talking about here are hierarchical detection probabilities – I’m just trying to keep the terminology from getting too thick.

**Although I have to say I found it very ironic that the software code GLMWM provided in an appendix, which uses the R package unmarked, arguably the dominant detection probability estimation software,  apparently had enough problems finding optima that they rerun each estimation problem 10 times from different starting points – a pretty sure sign that optima are not easy to find.