About Brian McGill

I am a macroecologist at the University of Maine. I study how human-caused global change (especially global warming and land cover change) affect communities, biodiversity and our global ecology.

The secret recipe for successful working group meetings?

As Meg noted recently, science is increasingly involving working groups. This is the big science aspect I discussed a while back in a lengthy footnote (and distinct from big data). Although the original (at least in ecology) synthesis center at NCEAS is no longer funded by NSF (but still very much alive funded by conservation NGOs), there are three other synthesis centers in the US (NESCent, NIMBios, SESynC). a somewhat differently functioning synthesis center iPlant, and centers in Canada, Australia, France, Germany and many other countries (http://synthesis-consortium.org/). And I’m increasingly seeing work done in “working group format” even when it is not tied to a center. The NSF RCN  (Research Coordination Network grant program) is an example but quite a few PIs on regular NSF grants or NGO/conservation grants are also choosing the working group format.

I am a self confessed working group junkie. I take (undue?) pride in the fact that I’ve been to groups at all five US centers (and led working groups at two of them), been part of an RCN, been to meetings at Germany’s sDiv and although not an official synthesis center part of the UK’s Kavli meetings, and will be at Canada’s CIEE in May  and if funded at CESAB in France soon. That leaves Australia as the only big miss on my list (at least for an ecologist), and I did participate in an NGO-funded working group remotely in Australia as well.

Working groups are a personal preference. Some people like them more than others. And some people are better at being part of them than others too! There is no best way to do science. But I think they’re a great format for doing a number of things including – addressing both sides of a debate and building consensus, reviewing a field, doing meta-analysis or assembling and analyzing large datasets, and coalescing ideas and energy at key points in the trajectory of a field (including at its launch and at its coming down from bandwagon status). Certainly they have been influential – NCEAS is one of the most cited institutions in ecology.

But working groups are not a format people are trained to work in, let a lone lead. Our whole PhD is focused on primarily solo work with a few interactions. Most “regular” papers are 1-5 people. Then we throw people into a group with 15-20 people and social dynamics that are an order of magnitude more complex with no training. What follows is my distillation of the key success factors of working groups. They do not unfortunately, despite the title, come together into a magic recipe that guarantees success. And there are of course some variation depending on goals. But in my experience, if you get all of the following ingredients you’ve got a good shot at success.

During the working group proposal process

  1. Group composition #1 – personalities matter – Working groups are first and foremost social enterprises (I will be repeating this sentence several times). And with the competing challenges on everyone’s time and only having a week to pull things together, you are on the edge of failure right from the start. So it may be tempting to get the biggest name in the field, but if they’re a colossal ego who doesn’t play well with others avoid the temptation. One bad apple really can spoil the barrel. Indeed only invite people that you know either personally or indirectly through a colleague to be a good collaborator. Twice I’ve been part of groups where the goal was explicitly to bring in people from opposing camps – but even here considerable effort was expended to only bring people in who could be part of a civil give-and-take dialogue and some of the extremists were intentionally left out..
  2. Group composition #2 – career stages – In my experience the ideal working groups has  a pyramid shape with the largest group being postdocs, the next largest group being early tenure track, and a much smaller sample of senior ecologists. I’ve never actually seen a truly pyramidal group, maybe a more realistic goal is rectangular – with equal representation of postdocs, early career, and senior. But definitely think about this.
  3. Meet for 5 days per session – There is a wide variety of opinion on this. And I’ve been part of 2 day meetings that are successful. But if you’re going to fly in people form around the world who are giving up 2-3 days to travel and jet lag, why would you meet for less than 4-5 days? Also in my experience it really does take that long to allow some of the and social processes and buy-in to a common goal to take place. It may be efficient to have small subset groups that meet shorter periods (or extensions to the 5 days).  And if everybody already knows each other so the social processes and goals are well worked out, sometimes fewer days works. But in most cases 5 days is an optimal number in my experience. And if people can’t commit the 5 days, they’re not going to be a big contributor anyway. The working group process is a slow one. There are many other advantages, but speed is not one.
  4. Who will do the work between meetings? – This is one of the motivations for #2 – everybody will leave a group meeting with good intentions. But who will actually spend more than 5 hours moving the project forward (i.e. doing data, simulations, analysis, writings)? If the PIs of the working group aren’t going to do this (and if they aren’t prepared to do this they probably shouldn’t be the PIs) and there aren’t any postdocs looking for good projects then odds are nobody will do this. There are some exceptions I’ve seen, where say the goal was a meta-analysis and during the meeting everybody was assigned say 10 papers to code before the next meeting. This very discrete chunk can be expected between meetings. And I’ve seen plenty of meetings where somebody uplanned stepped up to carry a load (but they were almost always postdocs or occasionally early career).

Beginning of meeting

  1. Do a Powerpoint death march on the first day – This is my tongue-in-cheek name for the idea of lettting everybody at the group stand up and give a presentation about their work related to the topic. This is oft-debated with many arguing it is a waste of time. But in my experience if you don’t give people a clear window to get their opinion out, they will spend the whole rest of the meeting slipping it in edgewise. I have seen this happen more than once and it can be really annoying when the whole group is converging and somebody is knocking on about their preconceived idea of how to do it – better to get it out of the way on the first day. It is in the long run more efficient to spend a day doing this. That said, the PIs can make this efficient or painful. Give people very clear instructions on what you wan them to present on. And give them absolute time limits (typically 10 or 15 minutes). Then ENFORCE the time limits rigidly. Conversation about a presentation is fine to run over a little since conversation is the point of a working group. But DO NOT let anybody deliver a monologue one minute over their planned time. This only needs to be done the first time a group meets.
  2. Do a regrouping and group agenda setting after the Powerpoint death march – After everybody has been heard from spending some time setting the agenda for the rest of the time. Many times the PIs will have a pretty clear idea. Other times, the goal really is to brainstorm the agenda together. But either way put it on a white board and talk it out a bit as a group and be open to changes. This will get you buy-in and understanding of the agenda. It will also get you the sum-is-greater-than-the-parts synergy that you are hoping for from a working group.
  3. PIs need to take their role as cruise director seriously – Working groups are first and foremost social enterprises (I promised you that idea would come back). I’ve never seen a successful working group that didn’t spend a lot of time going out to dinners. The PIs need to take the lead to make sure that these are organized by early afternoon so everybody knows and they need to set the example that this is an expected activity. There is an age old debate amongst group members who want to go to dinner right after work stops and those who want a couple of hours to go exercise first. Some compromise is needed. Some of the best working groups I’ve been part of have actually knocked off early one afternoon and gone for a hike or field trip. It might seem a waste of time, but trust me it pays off
  4. Lead a discussion about authorship expectations early – There is no right or wrong answer about who should be a co-author on papers from the group. But setting expectations in a group discussion up front is essential. Most groups I’ve been part of have decided that everybody present should be part of the core synthesis or review paper(s). You want to create an attitude where everybody is chipping in and not holding back their best ideas. Authorship is the best way to do this. Authorship rules on more subisidiary papers varies, but it should be collectively agreed up front.

Middle part of the meeting (e.g. days 2-4)

  1. Do the work – this is of course the end goal. But its the hardest to give generic advice about because the nature of the work varies. It may be finding and coding papers for a meta-analysis or assembling data sets. It might be a fairly large group discussion about consensus state of the field. It might be simulations. It might be a mixture of these things. But it probably occupies the bulk of the meeting – especially the middle days. And it probably involves breaking out into subgroups with different tasks or topics to cover.
  2. Regroup once or twice a day – even if much of the work will happen in breakout groups (and it almost certainly will) – bring the whole group back for 30 minutes before lunch and 30 minutes before dinner and have each group report in. This keeps everybody rowing in the same direction. It is also where much of the magic of working groups happens as recurring themes and areas of disagreement emerge.
  3. Follow a diamond-trajectory – This is true really of any brainstorming or group process. The goal in the beginning is to broaden out – open up minds, create crazy ideas, capture every thought. Then when things have gotten impossibly wide, it is time to pivot and turn energies into focusing and narrowing down. A key to a good working group is for the PIs to have the nerve to let things broaden out for a while (often several days)  and then have the leadership to firmly reign it back into a focus.
  4. Know when to force a turning of the corner to writing – closely related to #11. In no case should you start writing immediately. And one or two people will probably do the bulk of the writing probably after you go home. But you should definitely start writing (or at least detailed outlining) before you scatter. You might even assign sections and end up writing a whole draft while you’re at the working group. But this is another key decision point for the leaders – when to stop the talking/analyzing and start the writing. It should start (again at a minimum to outline stage) before you leave.
  5. Pace yourself – it is tempting to treat the working group time as so precious that you should work 12 hour days. But this is a big mistake. Aside from the great importance of social bonding (#7), you are doing a creative activity that requires fresh bright minds. Many of your members will have flown 12-24 hours to get there and be jet lagged. And the rest will be exhuasted by an intense pace long before the week is over. I’ve personally found that keeping the working group to 9-5 with at least an hour for lunch (plus joint dinners that are social) keeps things productive through day 5 while anything more leads to severe drooping by the end.
  6. Manage the email and phone calls – everybody will want/need to keep up on email and may make an occasional phone call to their lab managers, other collaborations, etc. In my experience the best way is to tackle this head on by building in time for it and then putting out a pretty clear expectation to be fully focused on the meeting the rest of time. I usually allow 60 minutes for lunch (this is a social enterprise …) and then a good 30-45 minutes immediately after lunch for phone calls and catching up on email. This way people can run a little long on lunch or end a little early and have more time for email as they wish. And you can expect (and demand) full attention the rest of the time.

End of the meeting (e.g. Day 5)

  1. When the meeting really ends – If you tell people the meeting ends at noon, they will book flights out at 9. If you tell people the meeting ends at 5, they will book flights out at 12 or 1. So tell them it ends at 5 and secretly (don’t let on your real plan) know that you really will end at 1:00PM. But don’t forget that long distance travellers will usually not fly out until the next day. You can still get some work done, and have one last dinner. You just won’t have everybody. As a PI you should definitely plan to stay until the day after the meeting is officially over and lead this tail end.
  2. Leave with clear assignments  - well before people start peeling out – i.e the morning of the last day – put a list on the projector or white board of tasks, deadlines and 1-2 names attached (5 names attached is the same as no names attached). Discuss this with the whole group.
  3. Accountability – Find a way to keep work flowing between meetings. Emails with reminders of tasks is a good way to do this. Circulating draft versions of papers or working versions of datasets is a good way too. In my experience scheduling a monthly phone call is also a good idea. Having somebody setup to be a “nagger” (either a PI or a postdoc) who keeps track of timelines is important too.

So – being a good leader of a working group just requires staying on top of 17 different things! If it sounds like leading a working group is exhausting – it is! Being a participant at a working group is exhausting, but being a leader and riding herd on the whole process is a whole other level of exhausting.

Obviously my 17 points are not a magic formula. Its just the wisdom I’ve pieced together over a couple of dozen working group meetings. And a couple like #11 and #12 require serious judgement on the PIs part – all I can do is highlight the question. And some will disagree with my list – I know from discussions I’ve had #3 and #5 are definitely not universally agreed upon.

What are your experiences? What are the ingredients in your secret recipe to a successful working group? What works and doesn’t work?

In praise of slow science

Its a rush rush world out there. We expect to be able to talk (or text) anybody anytime anywhere. When we order something from half a continent away we expect it on our doorstep in a day or two. We’re even walking faster than we used to.

Science is no exception. The number of papers being published is still growing exponentially  at a rate of over 5% per year (i.e. doubling every 10 years or so). Statistics on growth in number of scientists are harder to come by – the last good analysis I can find is a book by Derek de Solla Price in 1963 (summarized here) – but it appears the doubling time of scientists, while also fast, is a bit longer than for the doubling time of the number of papers. This means the individual rate of publication (papers/year) is going up. Students these days are being pressured to have papers out as early as their second year*. Before anxiety sets in, it should be noted that very few students meet this expectation and it is probably more of a tactic to ensure publications are coming out in year 4 or so. But even that is a speed up from publishing a thesis in year 6 or so and then whipping them into shape for publication which seemed to be the norm when I was in grad school. I’ve already talked about the growing number of grant submissions.

Some of this is modern life. Some of this a fact of life of being in a competitive field (and there are almost no well paying, intellectually stimulating jobs that aren’t highly competitive).

But I fear we’re losing something. My best science has often been torturous with seemingly as many steps back as forward. My first take on what my results mean are often wrong and much less profound than my 3rd or 4th iteration. The first listed hypothesis of my NSF postdoc proposal turned out to be false (tested in 2003-2004). I think I’ve finally figured out what is going on 10 years later. My first two papers did not come out until the last year of my PhD (thankfully I did not have an adviser who believed in hurry up science). But both of them had been churning around for several years. In both cases I felt like my understanding and my message greatly improved with the extra time. The first of these evolved from a quick and dirty test of neutral theory to some very heavy thinking about what it means to do models and test theory in ecology. This caused the second paper (co-authored with Cathy Collins) to evolve from a single prediction to a many prediction paper. It also lead to a paper in its own right. And influenced my thinking to this day. And in a slightly different vein since it was an opinion paper, my most highly cited paper was the result of more than 6 months of intense (polite but literally 100s of emails) back and forth debate among the four authors that I have no doubt resulted in a much better paper.

I don’t think I’m alone in appreciating slow science. There is even a “slow science” manifesto although it doesn’t seem to have taken off. I won’t share the stories of colleagues without permission, but I have heard plenty of stories of a result that took 2-3 years to make sense of. And I’ve always admired the people who took that time and in my opinion they’ve almost always gotten much more important papers out of it. I don’t think its a coincidence that Ecological Monographs is cited more frequently than Ecology – the Ecological Monographs are often magnum opus type studies that come together over years. Darwin spent 20 years polishing and refining On the Origin of Species. Likewise, Newton developed and refined the ideas and presentation behind Principia for over a decade after the core insight came.

Hubbell’s highly influential neutral theory was first broached in 1986 but he then worked on the details in private for a decade and a half before publishing his 2001 book. Would his book have had such high impact if he hadn’t ruminated, explored, followed dead ends, followed unexpected avenues that panned out, combined math with data and literature and ecological intuition and generally done a thorough job? I highly doubt it.

I want to be clear that this argument for “slow science” is not a cover for procrastination nor the fear of writing or the fear of releasing one’s ideas into print (although I confess the latter influenced some of the delay in one of my first papers and probably had a role with Darwin too). Publication IS the sine qua non of scientific communication – its just a question of when something is ready to write-up. There are plenty (a majority) of times I collect data and run an analysis and I’m done. Its obvious what it means. Time to write it up! So not all science is or should be slow science. Nor is this really the same as the fact that sometimes challenges and delays happen along the way in executing the data collection (as Meg talked about yesterday).

But there are those other times, after the data is already collected, where there is this nagging sense that I’m on to something big but haven’t figured it out yet. Usually this is because I’ve gotten an unexpected result and there is an intuition that its not just noise or a bad experiment or a bad idea but a deeper signal of something important. Often there is a pattern in the data – just not what I expected. In the case of the aforementioned paper I’ve been working on for a decade, I got a negative correlation when I (and everybody else) expected a positive correlation (and the negative correlation was very consistent and indubitably statistically and biologically different from zero). Those are the times to slow down. And the goal is not procrastination nor fear. It is a recognition that truly big ideas are creative, and creative processes don’t run on schedules. They’re the classic examples of solutions that pop into your head while you’re taking a walk not even thinking about the problem. They’re also the answers that come when you try your 34th different analysis of the data. These can’t be scheduled. And these require slow science.

Of course one has to be career-conscious even when practicing slow science. My main recipe for that is to have lots of projects in the pipeline. When something needs slowing down, then you can put it on the back burner and spend time on something else. That way you’re still productive. You’re actually more productive because while you’re working on that simpler paper, your subconscious mind is turning away on the complicated slow one too.

What is your experience? Do you have a slow science story? Do you feel it took your work from average to great? Is there still room for slow science in this rush-rush world? or is this just a cop-out from publishing?


*I’m talking about the PhD schedule here. Obviously the Masters is a different schedule but the same general principle applies.

How many terms should you have in your model before it becomes statistical machismo?

Before the holidays, I ran a poll asking why people’s models have gotten bigger and more complex (i.e. more terms in regressions). First it is worth noting that essentially nobody disagreed with me that models have gotten more complex.So  I am taking it as a given that my original characterization that the typical model has increased from a 2-way ANOVA (maybe with an interaction term) to say 4-8 terms (several of which may be interaction terms or random factors) just in the last 15 years.

Like every topic I place under the statistical machismo header, there is no one answer. No right or wrong. Rather it is a question of trade-offs where I hope to make people pause and question conventional wisdom which seems to always and only lead to ever increasing complexity. Here I definitely hope to make people pause and think about why they are building models with 5-8 terms. (NB the following is a typically long-winded blog post for me, feel free to skip to the bold summary paragraph at the bottom).

In econometrics this issue is taught under the title “ommitted variable bias” (and it is frequently taught in econometrics and often in psychology). One can mathematically prove that if you leave a variable out which is correlated with the variables you include this will lead to a bias in your estimation of the slopes for the variables you did include. The trade-off is that including more variables in a regression leads to a loss of efficiency (bigger error bars around your slope estimates). This seems then to boil down to a classic bias vs variance trade-off. I’m personally not too sold on this view point for this particular problem. First the mathematical proof has a very unrealistic assumption that there is a single definitive set of variables which alone cause the dependent variable – but this is never the real world. Second, although it might introduce bias, there is no way to know whether it biases slopes positively or negatively which means in practice you don’t know how its biased which in a weird meta way goes back to being effectively unbiased. The whole omitted variable bias is pretty decisively shredded in Clarke 2005.

Most ecologists I think are instead coming pretty much from Hurlbert’s extremely influential paper on pseudoreplication (which got a lot of confirmation in the survey). Hurlbert introduced the idea of pseudoreplication as a problem and made two generations of ecologists live in fear of being accused of pseudoreplication. However, nobody seems to recall that adding more terms to a regression is NOT one of the solutions Hurlbert suggested! And I’m willing to bet it would not be his suggested solution even with modern regression tools so easily available. His primary argument is for better more thoughtful experimental design! There is no easy post hoc fix through statistics for bad experimental design and I sometimes think we statisticians are guilty of selling our wares by this alluring but flawed idea. Beyond careful experimental design, Hurlbert, basically points out there are two main issues with pseudo-replication: confoundment and bad degrees of freedom/p-values. Let me address each of these issues in the context of adding variables to complexify a regression to solve pseudo-replication.

1) Confoundment – Hurlbert raises the possibility that if you have only a few sites you can accidentally get some unmeasured factor that varies across your sites leading you to mistakenly think the factor you manipulated was causing things when in fact its the unmeasured factor that is confounded with your sites by chance. However, and this is a really important point – Hurlbert’s solution (and anybody who thinks for five minutes about experimental design) is to make sure your treatment is applied within sites, not just across sites, thereby breaking confoundment. Hurlbert also goes into much more detail about relative advantages of random vs. intentional interspersion of treatments and etc. But the key point is confoundment is fixed through experimental design. This is harder to deal with in observational data (one of the main reasons people extol experiments as the optimal mode of inference). But in the social sciences and medicine it is very common to deal with confoundment in observational data by measuring and building in known confounding factors. Thus nearly every study controls for factors like age, race, income, education, weight, etc by including them in the regression. For example propensity to smoke is not independent of age, gender or income which in turn are not independent of health, so decisive tests of the health effects of smoking need to “remove” these co-factors (by including them in the regression). Either Hulrbert’s experimental design or social science’s inclusion of co-factors make sense to me. But in ecology , we instead tend to throw in so-called nuisance factors like site (and plot within site) and year but this does NOT fix confoundment (and is more motivated by non-independence of errors discussed below). To me confoundment is NOT a reason for the kinds of more complex models we are seeing in ecology. If you are doing an experiment, then control confoundment in the experimental design. And if it is observational include more direct causal factors (the analogs of age and demographics) like temperature, soil moisture, vegetation height and etc instead of site and year nuisance factors if you are worried about the confoundment problem of pseudoreplication.

2) Bad degrees of freedom/p-values – Hurlbert’s second concern with pseudo-replication (which is totally unrelated to confoundment and is not fixable by experimental design) relates to  p-values. This is because non-independence of error terms violates assumptions and essentially leads us to think we have more degrees of freedom than we really have, which since we divide by degrees of freedom to get p-values leads us to think our p-values are lower than than they really are (i.e. p-values are wrong in the bad way – technically known as anti-conservative). This is a mathematically true statement so the debate comes in with how worried we should be about inflated p-values..If we decide are worried we can just stop using p-values (recall this was Hurlbert’s recommendation but very few remember that part of the paper!). Nor does Hurlbert imply there are larger problems than the p-value inflation (and the confoundment raised above). In fact Hurlbert says psuedoreplication without p-values can be a rational way forward.

The question of whether to report p-values or not interacts in an interesting way with one of the main results of the survey. Many people feel like having more complex models is justified because they are switching to model selection approaches (i.e. mostly AIC). This approach is advocated by two of the best and deservedly most popular ecology stats books (Crawley & Zuur et al). But I have to confess that I am uncomfortable with this approach for several reasons. First, the whole point of model selection initially (e.g. Burnham and Anderson’s book) was to move away from lame models like the null hypothesis and compete really strong models against each other as Platt (1964 Science) recommended in his ode to strong inference. Comparing a model with and without a 5th explanatory factor does not feel like comparing strongly competing models so it does not feel like strong inference to me. Second, model selection is a great fit for prediction because it finds the most predictive model with some penalty for complexity (recall in the world of normally distributed errors AIC is basically the SSE minus 2* the # of parameters and SSE is also the numerator in R2 making a precise mathematical link between AIC and R2). But model selection is a really bad fit for hypothesis testing and p-values (again as anybody who has read the Burnham and Anderson book will have seen but few follow this advice). Although I don’t go as far as Jeremy and Andrew Gelman (I think doing one or two very simple pre-planned comparisons such as with or without interaction term and then reporting a p-value is probably OK), I strongly believe that one should not do extensive model selection and then present it as a hypothesis test. While I agree with Oksanen’s great take-down of the pseudoreplication paper that argues p-values are only courtesy tools, I don’t think most people using model selection and then reporting p-values treat them that way. I’m fine – more than fine - with pure exploratory approaches, but I think a lot of people are noodling around with really complex models and lots of model selection and then reporting p-values like they’re valid hypothesis tests. Indeed, I have had reviewers insist I do this on papers. This strikes me as trying to have your cake and eat it too and I think is one of the reasons I am so uncomfortable with the increasingly complex models – because they are so highly intertwined with model selection approaches.

I do think it is important to note that whatever the motive, there are genuine costs to building really complex models. The biggest cost is the loss of interpretability. We know exactly what a model with one explanatory factor is saying. We have a pretty good idea what a model with two factors is saying. And I even have a really precise idea what a 2-way ANOVA with an interaction term is referring to (the interaction is the non-additivity). But I have yet to see ever a really convincing interpretation of a model with 5 factors (other than “these are things that are important in controlling the dependent variable” at which point you should be doing exploratory statistics). And interaction terms (often more than one these days!) are barely interpretable in the best circumstances like when the hypothesis is explicitly about interaction. And while mixed models with random effects are a great advance, I don’t see too many people interpreting random effects in any meaningful way (e.g. variance partitioning) but the most commonly used mixed model tool – lmer – pretty much guarantees you don’t know what your p-values are (for good reasons) and the most common workarounds are wrong and often anti-conservative to such a degree that the author’s of the package refuse to provide p-values (e.g. this comment and Bolker’s comments on Wald tests). Again – if you want to do exploratory statistics, go to town and include 20 variables. But if you’re trying to interpret things in a specific context of particular factor X has an important effect, you’re making your life harder with more variables.

Another big problem with throwing lots of terms in is collinearity – the more terms you have, the more likely you are getting some highly correlated explanatory variables. And when you have highly correlated explanatory variables, the “bouncing beta problem” means you are basically losing control of the regression (i.e. depending on arbitrary properties of your particular data, the computer algorithm can assign almost all of the explanatory power – i.e slope – to either one or the other correlated variable – or in other words – if you drop even one data point the answer can completely change).

So, in summary, adding variables is a very weak substitute for good up front experimental design. It might be justified when the added variables are known to be important and are used to control for confoundment with sampling problems in an observational context. But that’s about it. And the techniques often invoked to make complex models viable such as random effects and model comparison pretty much guarantee your p-values are invalid. I find it very ironic so many people go to great lengths including nuisance terms to avoid pseudoreplication (to ensure their p-values are valid) then guarantee their p-values are invalid by using random effects and model selection. And good luck interpreting your complex model especially when coefficients are being assigned to collinear variables arbitrarily! So to my mind complex regression models straddle the fence very uncomfortably between clean hypothesis testing contexts (X causes Y and I hypothesized it in advance) and pure exploratory approaches – this fence sitting complex model approach to my mind has the worst of both worlds, not the best of both worlds.

To put it in blunt terms, it would appear from popular answers in the survey that many people are complexifying their models in response to Hurlbert’s issues of pseudo-replication and Burnham & Anderson’s call for model comparison but seem to forget that both of them actually call for abandoning p-values to solve these problems. And that Hurlbert’s paper was really a call for better experimental design and Burnham & Anderson’s book was a call for a return to strong inference by competing strong models against each other not tweaks on regressions. So these were both calls for clear, rigorous thinking before starting the experiment, NOT for post hoc fixes by adding terms to regression models.

So, I have to at least ask, how much of this proclivity for ever more complex models is a result of peer pressure, fear of reviewers and statistical machismo? I was a little surprised to see that no small fraction of the poll respondents acknowledged these factors directly.So I urge you to think about why you are complexifying your model. Is it an observational study (or weakly controlled experimental study) where you need to control for known major factors? Should you really switch to an exploratory framework? Are you willing to give up p-values and the hypothesis testing framing? if not, say no to statistical machismo and keep your model simple!

#ESA100 – big concepts and ideas in ecology for the last 100 years

ESA (Ecological Society of America) is celebrating its 100th anniversary in 2015. This will culminate in the 100th annual conference in Baltimore in August 2015. As part of the buildup, ESA has asked various people to discuss today (Dec 3) via social media “big ideas or discoveries that have had the greatest impact on ecological science over the last century”. So I’m sharing my thoughts today. Meg & Jeremy will add their own over the next few weeks. Check out Twitter hashtag #ESA100 as well.  (And the Brits reading this don’t have to remind us that 2013 was the 100th anniversary of the British Ecological Society and they got there first as I was also at and enjoyed their 100th conference).

A couple months back I took a stab using Wordles at how ecology has changed in 25 years. For this longer time frame of 100 years, I’m not going to pass the buck to technology and going straight to my own (lengthy!) opinions. I am going to divide this into three sections – core ideas that spanned most or all of 1915-2014, ideas that emerged over the latter half of that 100 year period that are dominate now, and ideas that I predict will dominate the next 100 years. I will also divide each section into tools/methods and ecological concepts.

Tools/methods 1915-2014

  • Differential equations as models of population abundance – without a doubt this has been one of the most dominant ideas for the last 100 years. It started as a way of modelling dynamical chemical reactions and then moved into ecology in the 1920s in the work of Lotka and Volterra (although Verhulst presaged this work with the logistic equation of human population growth in the 1830s rediscovered by Pearl in 1920). By the 1930s full treatises applied to competition, predation and mutualism appeared such as Lotka’s excellent 1925 book Elements of Mathematical Biology (worth reading still today!), Gause’s 1934 book Struggle for Existence and the review paper by Gause and Witt 1935 (Am Nat). If you look at any modern theoretical ecology book (or any undergraduate ecology text book) you will see these core ideas explained in great detail and then elaborated on with age structure, stochasticity, time lags etc. In the 1970s there was a movement led by Robert MacArthur and EO Wilson to define this use of differential equations focused on populations as the sine qua non of good ecology with profound effects (the highly influential population biology graduate program at UC Davis as one example). I would argue population-level differential equations has been THE dominant tool for the last 100 years. I am personally a little ambivalent about this. While I think quantification and math are important in science, I don’t think populations are the only important scale to study (and its not obvious what one variable would be at the center of differential equations at other scales), we have only been able to capture parameters for rates of change of populations in highly phenomenological (almost circular) fashion, and the differential equation approach has led to an overemphasis on equilibria (something easy to solve for in differential equations but not so obviously a prominent feature of nature)

Concepts 1915-2014

  • Succession – succession of plants in the Indiana sand dunes was the 1898 thesis topic of Henry Cowles, founder of the ESA. Frederic Clements also worked on this in the early decades of the 20th Succession has been at the center of one of ecology’s great, ongoing defining debates: individualistic responses vs. species interactions and community structure (Gleason vs Clements). Succession played a central role in Odum and Whittaker’s undergrad textbooks in the 1970s and you can still find a full chapter devoted to succession in every popular textbook today. At the same time succession has become passé in the last 30 years (e.g. the 2009 Princeton Guide to Ecology has almost 100 entries but not one on succession). Deserved death or a pendulum swing that will come back? (Can I say both of the above?)
  • Competitive Exclusion, Limiting Similarity, Niche overlap – and etc – The fact that one of four outcomes of the Lotka-Volterra differential equation model of competition leads to competitive exclusion followed in short order by Gause’s 1934 experimental confirmations in a microcosm has led to a central role for competitive exclusion and related ideas like limiting similarity, niche overlap, body size ratios, closely related species not co-occurring in communities in phylogenetic community ecology, and etc. If I were to pick one concept that dominated ecology from the 1930s to the present day, this would be it. Indeed, I would probably go further and say it crossed the line to become an obsession. Competition incorrectly received primacy over predation, disease and mutualism. And the blindingly obvious fact that species coexist outside our window even if they don’t in homogenous bottle systems has not prevented an over focus on how two species coexist instead of the more important question of what controls whether it is 2, 5, 20, or 200 species coexisting.
  • Food webs - the food web idea loosely goes back to Stephen Forbes, arguably the first ecologist in America with his 1887 essay on “The lake as a microcosm”. Food webs sensu strictu as a graph of who eats who have run as a key idea through the work of Charles Elton, Robert Paine’s starfish removals and keystone species, Joel Cohen, Stephen Carpenter’s trophic cascades, work on alternative stable states and right into the present day with efforts to model the population dynamics in a food web context using differential equations. Network theory is hot these days and a clear extension of food webs. Food webs sensu latu have also served as a metaphor for the idea both in research ecology and the environmental movement that everything is connected to everything. We love these stories – remove one little insect and watch the whole ecosystem collapse.
  • Ecophysiology culminating in mechanisms driving biomes – naturalists all the way back to von Humboldt and Darwin, some of the early German founders of ecology (Warming, Schimper) and running through Robert H Whittaker and his 1975 book have noted that there is very systematic variation in vegetation structure and type across the globe with climate (tall trees in wet tropics, savannas in dry tropics, thorn scrub in deserts, grasslands in dryish summer wet places, Mediterranean in dryish winter wet places, etc). This topic remains active into the present day with attempts to include realistic models of vegetation in global circulation models and carbon models. I am hard pressed to pinpoint a single turning point (although Gates 1965 book Biophysical Ecology is a good stab) , but we have gradually worked out the core physiological principals driving this (water balance, heat balance, photosynthesis controls, etc) and some of the biggest names in ecology (Hal Mooney, Stuart Chapin, Christian Korner, Monteith, Graham Farquhar, Ian Woodward) have worked in this area. The animal people have not been quite as successful in prediction of distribution and abundance, probably because there is not as much variation in growth forms as in plants, but great progress has still been made, especially in lizards and/or thermal ecology, by the likes of Ray Huey, Warren Porter, Bruce McNab etc. Whether you call this field physiological ecology, functional ecology, biophysics or something else it is one of the few areas of ecology to become predictive from first principles.
  • Importance of Body Size – if you could only know two simple facts about an organism, probably the two things you would want to know is which taxonomic class (bird, mammal, angiosperm, fern) and body size. Body size makes good predictions about who will eat who, degree of thermal stress, etc but also, in a relationship that is unusual precise in ecology, metabolic rate and a whole host of things that are connected including calorie requirements, growth rates, life span, age of maturity, intrinsic rate of population increase, dispersal distance, etc. The central role of body size has been understood at least since 1932 (Kleiber in a German publication and a 1947 paper “Body size and metabolic rate”in English). Two 1980s books (Peters 1983 The ecological implications of body size and Calder’s 1984 Size Function and Life History) showed just how central body size is. This work received significant recent attention through some of the most highly cited papers in ecology by Jim Brown, Brian Enquist and Geoff West among others. Although the potential of this discovery to inform about poorly understand species of conservation is in my opinion still underappreciated there have been some very clever applications including a paper by Pereira and Gordon 2004 in Ecological Applications, a fun one on dinosaurs by Farlow 1976 in Ecology and John Lawton’s 1996 calculation of the population dynamics of the Loch Ness monster in Oikos. Mechanisms are still hotly debated but the sheer statistical predictive power of body size is rare in ecology.

Tools 1950s to 2050s

This section and the next contain ideas on tools and concepts that got their start in the latter half of the 20th century and are arguably still in their infancy today but with much work going on.

  • Stable Isotopes – I’ve never personally used this technique, but the ability to quantify the ratio of different isotopes of an element (say oxygen 16 and 18) in small samples has revolutionized what we can measure. We can measure how old things are (when they died) (carbon), where they came from (strontium), how hot or cold it was when tissue was laid down (hydrogen among others), where/when water used by a plant came from (again hydrogen among others), how high up the trophic chain a species eats (nitrogen), and on and on. I’m sure we’re nowhere near the end of novel measurements that can be done with stable isotopes.
  • Phylogenetics – starting with Willi Hennig’s 1950 book on cladistics and Felsenstein’s early 1970s and 1980s papers and software on methodology followed by many others, the ability to unravel the precise evolutionary history of species has changed not just evolution but ecology. Like any such tool, it has created some bandwagons, but there is no denying it has changed our ability to ask meaningful ecological questions in a macroevolutionary context (how fast do different traits evolve, is higher species richness in the tropics due to speciation or extinction, how do coevolving clades speciate, and etc).

Concepts1950s to 2050s

  • Space– there has been exponential growth in the study of the role of space in structuring populations and communities. Arguably this started in the 1950s with Andrewartha’s 1954 ecology textbook that had what we would now call a metapopulation on the cover and Skellum’s 1951 diffusion equation models. This was followed by Levins 1969 paper on metapopulations, Hanski’s development and popularization of metapopulations through the 1980s and 1990s, MacArthur and Wilson’s 1969 island biogeography, Simon Levin’s 1970s work showing space can be a coexistence mechanism, Monica Turner and many other’s launch of landscape ecology as a subsdiscipline, the increasing interest in the role of regional pools in structuring local communities (accelerated by Hubbel’s 2001 neutral theory), and the growing interest in dispersal ecology and the role of dispersal limitation. The recent recognition of the importance of scales is closely tied to finally putting our understanding into a spatial context. I don’t think we’re done with space yet.
  • Evolutionary Ecology/Optimality – Hutchinson wrote a textbook in 1965 on The ecological theater and the evolutionary play, arguing for the need for stronger links between the two fields (or more precisely observing they exist whether we ignore them or not). Judging by the proliferation of journals in the field of evolutionary ecology I think he was heard! The backlash against Wynne-Edwards 1962 book (Animal dispersal in relation to social behavior) containing group selection arguments certainly focused our collective minds as well. A great deal of individual behavior and life history as well as sociality are now evaluated through the lens of evolution. So are species interactions (i.e. coevolution). And although it is only a short cut, optimality with constraints is a very useful short cut to understand the evolutionary outcome of behavior ranging from foraging to habitat selection to various forms of game theory.
  • Mutualisms and Facillitation – competition and predation were dominant ideas for the last 100 years (and remain dominant ideas). But it seems mutualism didn’t get much love until recently. While the existence of mutualism was well understood 100 years ago (and the aforementioned Gause and Witt paper gave a model of mutualism population dynamics in 1935), understanding mutualism as a fundamental structuring force of communities came much more recently. The growth of tropical ecology certainly fed an interest in mutualism as has the increasing study of pollination as an ecosystem service and the idea of facilitation (a gradient from competition to mutualism depending on the harshness of environmental conditions). The role of the microbiome mutualism is likely to be part of Meg’s answer to great conceptual advances.
  • Macroecology – I may be biased on this one … but I think the snapping out of amnesia induced by the population only approach to return to our roots and look at some of the oldest questions in ecology like the controls of species richness, the controls of abundance, species ranges, distribution of body sizes etc has been a very good thing. Not in a replacement sense (of e.g. population biology), but in an addition sense of we have to tackle these questions now and not work up to them in 100 years when population ecology is all figured out. And I think it has happened just in time with the looming challenge of global change. It is interesting that the arc of the careers of many of the most famous community ecologists (Rosenzweig, Brown, Mittelbach, Ricklefs, Lawton, May and, ahead of his time, MacArthur) all included a turn toward macroecology. And macroecology seems to be at a magic scale such that it has produced many of the most law-like, universal principles in ecology (abundant species are rare, big-bodied animals are rare, species area relationships, decay of similarity with distance, etc). I could go on for many pages on this topic alone, but I’ll stop here for now!

Tools 21st century

  • Remote sensing – Using digital images taken from elevation so as to cover large areas (usually from airplanes or satellites, but increasingly towers too) has been slowly creeping into ecology for decades. At the moment, remote sensing is more informative about the environment (e.g. topography) and the ecosystem aspects (e.g. NDVI as a proxy for productivity or greenup). And these will be continue to be growth areas. I am part of a project supplementing ground weather stations with satellite measurements of weather to fill in the gaps on the ground, and I suspect it won’t be too long until we can dispense with ground measurements entirely for coarse scale measurements of climate at remote sites. And Greg Asner’s work among others on using hyperspectral (100s of channels or frequencies instead of the usual 3 or 4 – think very fine gradations of color) allowing detecting of nitrogen levels in leafs and etc is impressive. But I also think we’re within a decade or two of remote sensing being able to identify individuals to species in some settings like canopy trees or ungulate herds. And that will open up whole new spatial scales to abundance questions.
  • Biodiversity informatics – Linnaeus became famous for being the first to formally catalog biodiversity. We have had museums and collectors working at this goal ever since. The rapid movement of this data into online databases is opening up whole new vistas. These include generating species ranges for entire classes of organisms (e.g. all vertebrates by NatureServe and others and soon 100,000+ plants of the New World by BIEN). And changes in species ranges, phylogeny and traits like morphometrics over the last 100 years or so are being evaluated by using dates on collection records. And even at the most basic level, having online, real-time updatable standardized taxonomies are a great boon to those studying poorly known systems. We’re also finally starting to get a handle on some surprising basic trends in biodiversity metrics that make us realize how little we really know about biodiversity trends in response to the Anthropocene.

Concepts 21st century

  • Species Richness – Will the 21st century finally answer one of the greatest questions in ecology first raised in the 19th century – why are there more species in the tropics? And more generally will we get traction on the question of what factors control species richness at different scales and along different gradients? I am optimistic – I just hope it happens in my lifetime.
  • Species Ranges – the species range is one of the most fundamental properties of a species, and there is a pressing need for prediction of how ranges will respond to climate change, habitat destruction, etc. Yet we have mostly just danced around the edges of this problem and really only have anecdotes for specific species and specific range edges plus a giant cottage industry of predicting range shifts using correlative niche models that aren’t that well validated. We’ve got to do better on this one!
  • A predictive theory of the response to global change – I’ve harped on the need for ecology to become more predictive, and I personally think the biggest intellectual challenge and the biggest test of whether ecology has any value for society is being able to predict how the biosphere will respond to human-caused global change (i.e. the Anthropocene).

What I left out

I have left a number of obvious choices out. Some of these are statements of my ignorance. For example, I don’t know enough about ecosystem science to name the big concepts (probably global nutrient cycles and controls of productivity globally belong in the 1950s to 2050s key concepts category, but I don’t have anything intelligent to say on them). And the same for DNA barcoding as a tool for the 21st century.

Other omissions are more intentional. For example, a lot of energy has gone into niches, the diversity stability debate, disturbance ecology, population cycles, and etc but I’m not sure they have yet earned their keep as key concepts that have fundamentally changed our view of ecology (which is not to say they couldn’t still do it).  I’d be curious to hear other nominations for this category in the comments (or disagreement with my intentional omissions). I’ve left out some obvious tools too. For example, I didn’t mention a single statistical method as a key tool. This probably shouldn’t surprise readers who know I’m a pretty firm believer that ecology should be in the driver’s seat and statistics is just a tool. The same with big data. Any number of topics I mentioned above intersect with big data (all the way back to von Humbold and Kleiber!). But I just don’t see discussion of big data in isolation as a useful way forward – big data is just a tool letting us finally crack controls of species richness, biodiversity trends, etc.  I could have suggested the computer as a key tool, for it certainly has been – it has enabled larger data, much better statistics, null models, complex simulations, not to mention big data and etc. But it’s a little trite and too general to list here.

What do you think? What is missing from my list? What should I have left off? Answer in the comments or Tweet with #ESA100.

Why are your statistical models more complex these days?

I serve on a lot of graduate committees. I also end up as statistical consultant for a lot of students and faculty. So I see a lot of people thinking through their models and statistical approaches.

One trend I am noticing is that most people are staying within the linear framework (e.g. not GAM or regression tree), but their models are becoming increasingly complex. That is to say more and more terms are being included. And they are more and more of what I would call blocking/nuisance terms. I’m not talking about “big data” or exploratory or data mining approaches where people have dozens of potential variables and no clue which to use.

I’m talking about traditional field experiments or behavioral/physiological observations of individuals or small scale observational studies. And I’m noticing in the dark ages (=when I was in graduate school=11 years ago)  there would be a 2 or at most 3-way ANOVA with maybe one or two interaction terms. Now everybody is running multivariate regressions or more often mixed effect models. And there are often 4-5 fixed effects and 3 or 4 (often with nesting) random effects and many more interaction terms and even sometimes people want/try to look at interaction terms among random effects (an intensely debated topic I am not going to weigh in on).

As one example – in the past I almost never saw people who collected data over two or three years (i.e. all PhD programs and grants) include year as an explanatory factor (fixed or random) unless there was really extreme variability that got turned into a hypothesis (e.g. El Nino vs La Nina which happened not infrequently in Arizona). Now everybody throws in year as an explanatory factor even when they don’t think there was meaningful year-to-year variability.

And for what it’s worth, putting even two crossed (as opposed to nested) random factors into the lme command in the nlme R package was somewhat arcane and of mixed recommendability, while crossed random effects are easily incorporated in the newer lmer command in lme4. So it might just be evolving software, but I don’t really believe software capacity alone is driving this because  I’m also seeing the number of fixed factors going up and I never used to hear people complaining about it being hard to include 2 crossed random factors in lme. But it does prove the complexity of models has gone up since the models I see as common place today weren’t even supported by the software 3-5 years ago.

Now I am on record that the move to multivariate regression framing instead of ANOVA is a very good thing. And I haven’t said it in this blog but every time I teach mixed effect models I say they’re one of the most important advances in statistics for ecology over the last couple of decades. So I’m not critiquing the modelling tools.

But I am suspicious of the marked increase of the number of factors from approximately 2-3 with few interactions to 4-8 with many interactions (and again this is not in an exploratory framework with dozens of variables and something like regression trees). I’m a notorious curmudgeon, suspicious of any increase in statistical complexity that is not strongly justified in changing our ECOLOGICAL (not statistical interpretations). But I’m clearly out of the mainstream. And although I can say some particular specific practices or motives around complex models are wrong, I cannot say that more complex models in general are wrong. So maybe I’m missing something here.

So please enlighten me by taking the below poll on why you think models have become more complex over the last 10 years. You can check up to six boxes but 2-4 is probably more informative.

(I am going to offer my own opinions in a future blog post but I don’t want to bias the poll because I am really genuinely curious about what s driving this phenomenon – and by the same token I’m not going to be active in the comments on this post but hope you are).

Do you have any good examples where your ecological understanding was greatly increased by 4-8 factors instead of 2-3? Do you have an example of a  killer interpretation of 4 factors in one model? Do you think you’re  still in a hypothesis testing framework when you have, for example, 5 fixed factors and three random factors? What about if you’ve done some model comparisons to get from 5 fixed/3 random down to 3 fixed/2 random?

What math should ecologists teach

Recently Jeremy made the point that we can’t expect ecology grad students to learn everything useful under the sun and asked in a poll what people would prioritize and toss. More math skills was a common answer of what should be prioritized.

As somebody who has my undergraduate (bachelor’s) degree in mathematics I often get asked by earnest graduate students what math courses they should take if they  want to add to their math skills. My usual answer is nothing – the way math departments teach math is very inefficient for ecologists, you should teach yourself. But its not a great answer.

In a typical math department in the US, the following sequence is the norm as one seeks to add math skills (each line is a 1 semester course taken roughly in the sequence shown)

  1. Calculus 1 – Infinite series, limits and derivatives
  2. Calculus 2 – Integrals
  3. Calculus 3 – Multivariate calculus (partial derivatives, multivariate integrals, Green’s theorem, etc)
  4. Linear algebra – solving systems of linear equations, determinants, eigenvectors
  5. Differential equations – solving systems of linear differential equations, solving engineering equations (y”+cy=0)
  6. Dynamical systems – yt+1=f(yt) variations including chaos
  7. Probability theory (usually using measure theory)
  8. Stochastic processes
  9. Operations research (starting with linear programming)

That’s 7 courses over and above 1st year calculus to get to all the material that I think a well-trained mathematical ecologist needs! There are some obvious problems with this. First few ecologists are willing to take that many classes. But even if they were, this is an extraordinary waste of time since over half of what is taught in those classes is pretty much useless in ecology even if you’re pursuing deep into theory. For example – path and surface integrals and Green’s theorem is completely irrelevant. Solving systems of linear equations is useless. Thereby making determinants more or less useless. Differential equations as taught – useless (to ecologists very useful to physicists and engineers). Measure-based probability theory – useless. Linear programming – almost useless.

Here’s my list of topics that a very well-trained mathematical ecologist would need (beyond a 1st year calculus sequence):

  1. Multivariate calculus simplified (partial derivatives, volume integrals)
  2. Matrix algebra and eigenvectors
  3. Dynamical systems (equilibrium analysis, cycling and chaos)
  4. Basic probability theory and stochastic processes (especially Markov chains with brief coverage of branching processes and master equations)
  5. Optimization theory focusing on simple calculus based optimization and Lagrange multipliers (and numerical optimization) with brief coverage of dynamic programming and game theory

Now how should that be covered? I can see a lot of ways. I could see all of that material covered in a 3 semester sequence #1/#2, #3, #4/#5 if you want to teach it as a formal set of math courses. And here is an interesting question. We ecologists often refuse to let the stats department teach stats to our students (undergrad or grad)  because we consider it an important enough topic we want our spin on it. Why don’t have the same feelings about math? Yet as my two lists show math departments are clearly focused on somebody other than ecologists (mostly I think they’re focused on other mathematicians in upper level courses). So should ecology department start listing a few semesters  of ecology-oriented math on their courses?

But I could see less rigorous, more integrative ways to teach the material as well. For example, I think in a year long community ecology class you could slip in all the concepts. Dynamical systems (and partial derivatives) with logistic/ricker models and then Lotka-Volterra. Eigenvectors and Markov Chain’s with Horn’s succession models or on age-stage structure, then eigenvectors returning as a Jacobian on predtor-prey. Master equations on Neutral Theory. Optimizaiton on optimal foraging and game theory Yes the coverage would be much less deep than a 3 semester sequence of math only courses, but it would, I think, be highly successful.

I say “I think” because, I don’t know anywhere that teaches the math this way. I teach a one semester community ecology grad class and try to get a subset of the concepts across, but certainly don’t come anywhere close covering everything that I wish were covered (i.e. my list above). And I know a lot of places have a one-semester modelling course for grad students. But teaching their own math courses, or teaching a math-intensive ecology sequence I haven’t come across.

What do you think? Have I listed too much math? or left your favorite topic out? How should this be taught? How many of our students (undergrads, just all grads, only a subset of interested grads) should this be taught to?.

Detection probability survey results

Last week, I highlighted some new results from a paper on detection probabilities and placed detection probabilities in the context of estimator theory. This in turn led to a a reader poll to try to get a sense of how people thought about experimental design with detection issues.

Although I don’t want to spend too much time on it here, I wanted to briefly highlight a great paper that just came out “Assessing the utility of statistical adjustments for imperfect detection in tropical conservation science” by Cristina Banks-Leite and colleagues. They look at several real world scenarios focused on identifying covariates of occupancy (rather than absolute occupancy levels) and show the results are not much different with or without statistical adjustment. They draw a distinction between a priori control for covariates of detection probability in setting up a good study design vs a posteriori statistical control for detection probability and point out that both are valid ways of dealing with detection issues. The take home quote for me was “We do not believe that hard-won field data, often on rare specialist species, should be uniformly discarded to accord with statistical models”. Whereas my last post was very theoretical/statistical this paper is very grounded in real-world, on-the ground conservation, but in many ways makes many of the same points. It is definitely worth a read.

Turning now to the survey … at the time of analysis Wednesday morning there were 168 respondents. You can view the raw results here. There was a reasonably good cross section of career stages and organisms represented although the employment sector skewed very heavily to university. And of course “readers of a blog who chose to respond to a poll” is in no way a scientifically designed sample. If I had to speculate this particular post attracted a lot of people interested in detection probabilities, but what exact bias that would result in is hard to predict.

Recall I presented two scenarios. Scenario A was to visit 150 sites once. Scenario B was to visit 50 sites 3 times each. The goal was to estimate how occupancy varied with four collinear environmental variables.

Probably the lead result is the recommended scenario:

detprob_advice

Scenario B (50 sites 3 times) was the most common recommendation but it by no means dominated. Over 10% went for scenario A outright. And 20% noted that choosing required more information – with most people saying the critical information was more knowledge about the species – well represented in this quote on what the choice would depend on: “A priori expectation of potential for detection bias, based on species biology and survey method.”. It should be noted that a non-trivial fraction of those who went for B did it not to support detection probabilities but for reasons of sampling across temporal variability (a goal that is contradictory with detection probability modelling which assumes constant conditions and even constant individuals across the repeat visits). 17% also went for B but with hesitation (either putting statistical expertise of others over their own field intuition or else feeling it was necessary to publish).

There was a trend (but definitely not statistically significant) for more graduate students to recommend B and more senior career people (while still favoring B) to switch to “it depends”. Similarly there was a non-significant trend for people who worked on vertebrates to favor B and for people who worked on plants and inverts to switch a bit to scenario A (with scenario B still a majority).

Quite a few people argued for a mixed strategy. One suggestion was to visit 100 sites with 2 repeat visits to 25 of them. Another suggested visiting 25 sites 3 times, then making a decision how to proceed. And there were quite a few variations along this line.

The story for my question about whether there was pressure or political correctness to use detection probabilities was similar (not surprisingly). There was a weak trend to yes (mean score of 3.09) but not significant (p=0.24). Graduate students were the most likely to think there was PC-ness and senior career people the least likely. People working in verts and plants were more likely to see PC-ness than people working on inverts (again all non-significant).

So the overall pattern is a lean to scenario B but a lot of diversity, complexity and nuance. And not much if any perception of PC-ness around having to use detection probabilities ON AVERAGE (some individuals felt rather strongly about this in both directions).

In short, I think a majority of respondents would have agreed with this quote from one respondent:  “… the most important part of study design is…thinking. Each situation is different and needs to be addressed as a unique challenge that may or may not require approaches that differ from those used in similar studies.” Which nicely echoes the emphasis in this blog on the need to think and not just apply black and white universal rules for statistics and study design.

Detection probabilities – back to the big picture … and a poll

I have now had two posts (both rather heavily read and rather contentiously debated in the comments) on detection probabilities (first post, second post). Whether you have or haven’t read those posts, they were fairly technical (although my goal was to explain technical issues in an accessible way).

Here I want to pull way back up to 10,000 feet and think about the boots on the ground implications. And for a change of pace, I’m not going to argue a viewpoint. I just am going to present a scenario (one I see every semester, one that I know students all over the world face from conversations when I travel) and ask readers via a poll what they would advise this student.

So you are on the committee of a graduate student. This student’s project is to study the species Debatus interminus which may be a candidate for threatened listing (little is really known). The primary goals are: 1) to assess overall occupancy levels of D. interminus and 2) to figure out how occupancy varies with four variables (vegetation height, canopy closure, soil moisture, and presence of its one known predator, Thinking clearus). Obviously these four variables are moderately collinear. Given resources, length of project, accessibility of sites, that the student is the only person able to visit the sites, etc you calculate the student can do exactly 150 visits. Various members of the committee have advised the student that she/he should:

  • Scenario A – identify 150 sites across the landscape and visit each site 1 time, then estimate ψ (occupancy), and do a simple logistic regression to give β, a vector of regression coefficients  for how ψ varies with your four variables across 150 sites.
  • Scenario B – identify 50 sites across the landscape and visit each site 3 times, then develop a simple hierarchical model of detection proabilities so you will estimate ψ (occupancy), p (detection probability), and β, a vector of regression coefficients in a logistic regression for how ψ varies with your four variables at 50 sites.

Would you advise the student to follow scenario A or B? And why? Please take our poll (should take less than 5 minutes). I am really curious what our readership will say (and I care about this poll enough that I’ve taken the time to do it in Google polls so I can cross tab the answers with basic demographics – but don’t worry your anonymity is ensured!)

Depending on level of interest I’ll either post the results in the comments or as a separate post after a few days.

And – as everybody knows – a poll in a blog is not a scientific sample, but it can still be interesting.

Detection probabilities, statistical machismo, and estimator theory

Detection probabilities are a statistical method using repeated sampling of the same site combined with hierarchical statistical models to estimate the true occupancy of a site*. See here for a detailed explanation including formulas.

Statistical machismo, as I define it in this blog, is the pushing of complex statistical methods (e.g. reviewers requiring the use of a method, authors claiming their paper is better solely because of the use of a complex method) when the gains are small or even occur at some cost. By the way, the opposite of statistical machismo is an inclusive approach that recognizes every method has trade-offs and there is no such thing as a best statistical method.

This post is a fairly technical statistical discussion .If you’re interested in detection probabilities but don’t want to follow the details, skip to the last section for my summary recommendations.

Background

I have claimed in the past that I think there is a lot of statistical machismo around detection probabilities these days. I cited some examples from my own experience where reviewers insisted that detection probabilities be used on data sets that had high value in their spatial and temporal coverage but for which detection probabilities were not possible (and even in some cases when I wasn’t even interested in occupancy). I also discussed a paper by Welsh, Lindenmayer and Donnelly (or WLD) which used simulations to show limitations of detection probability methods in estimating occupancy (clearly driven by their own frustrations of being on the receiving end of statistical machismo for their own ecological papers).

In July the detection probability proponents fired back at WLD with a rebuttal paper By Guillero-Arroita and four coauthors (hereafter GLMWM). Several people have asked me what I think about this paper including some comments on my earlier blog post (I think usually in the same way one approaches a Red Sox fan and asks them about the Yankees – mostly hoping for an entertaining reaction).

The original WLD paper basically claimed that in a number of real world scenarios, just ignoring detection probabilities gave a better estimator of occupancy. Three real-world scenarios they invoked were: a) when the software had a hard time finding the best fit detection probability model, b) a scenario with moderate occupancy (Ψ=40%) and moderate detection probabilities (about p=50%), and c) a scenario where detection probabilities depend on abundance (which they obviously do). In each of these cases they showed, using Mean Squared Error (or MSE, see here for a definition), that using simple logistic regression only of occupancy ignoring detection probabilities had better behavior (lower MSE).

GLMWM basically pick different scenarios (higher occupancy Ψ=80%, lower detection p=20% and a different SAD for abundances) and show that detection probability models have a lower MSE. They also argue extensively that software problems finding best fits are not that big a problem**. This is not really a deeply informative debate. It is basically,” I can find a case where your method sucks. Oh yeah, well, I can find a case where your method sucks.”

Trying to make sense of the opposing views

But I do think  stepping back, thinking a little deeper, framing this debate in the appropriate technical context – the concept of estimation theory, and pulling out a really great appendix in GLMWM that unfortunately barely got addressed in their main paper, a lot of progress can be made.

First, lets think about the two cases where each works well. Ignoring detection worked well when detection probability, p, was high (50%). It worked poorly when p was very low (20%). This is just not surprising. When detection is good you can ignore it, when it is bad you err to ignore it! Now WLD did go a little further, they didn’t just say that you can get away with ignoring detection probability at a high p – they actually showed you get a better result than if you don’t ignore it. That might at first glance seem a bit surprising – surely the more complex model should do better? Well, actually no. The big problem with the detection probability model is identifability – separating out occupancy from detection. What one actually observes is Ψ*p (i.e. that % of sites will have an observed individual). So how do you go from observing Ψ*p to estimating Ψ (and p in the case of the detection model)? Well ignoring p is just the same as taking  Ψ*p as your estimate. I’ll return to the issues with this in a minute. But in the detection probability model you are trying to disentangle Ψ vs. p just from observed % of sites with very little additional information (the fact that observations are repeated on a site). Without this additional information  Ψ*p are completely unseparable – you cannot do better than randomly pick some combination of  Ψ and p and that together multiple to give the % of sites observed (and again the non-detection model essentially does this by assuming p=1 so it will be really wrong when p=0.2 but only a bit wrong p=0.8). The problem for the detection model is that if you only have two or three repeat observations at a site and p is high, then most sites where the species is actually present it will show up at  all two or three observations (and of course not at all when it is not present). So you will end up with observations of mostly 0/0/0 or 1/1/1 at a given site. This does not help differentiate (identify)  Ψ from p at all. Thus it is actually completely predictable that detection models shine when p is low and ignoring detection shines when p is high.

Now what to make of the fact, something that GLMWM make much of, that just using Ψ*p as an estimate for Ψ is always wrong anytime p<1. Well, they are correct about it always being wrong. In fact using the observed % of sites present (Ψ*p) as an estimator for Ψ is wrong in a specific way known as bias. Ψ*p is a biased estimator of Ψ. Recall that bias is when the estimate consistently overshoots or undershoots the true answer. Here Ψ*p consistently undershoots the real answer by a very precise amount Ψ*(1-p)  (so by 0.2 when Ψ=40%  and p=50%). Surely this must be a fatal flaw to intentionally choose an approach that you know on average is always wrong? Actually, no, it is well known in statistics that sometimes biased estimator are the best estimator (by criteria like MSE).

Estimation theory

Pay attention here – this is the pivotal point – a good estimator has two properties – it’s on average close to right (low bias), and the spread of its guesses (i.e. the variance of the estimate over many different samples of the data) is small (low variance). And in most real world examples there is a tradeoff between bias and variance! More accurate on average (less bias) means more variance in the guesses (more variance)!  In a few special cases you can pick an estimator that has both the lowest bias and the lowest variance. But anytime there is a trade-off you have to look at the nature of the trade-off to minimize MSE (best overall estimator by at least one criteria). (Since  Mean Squared Error or MSE=Bias^2+Variance one can actually minimize MSE if one knows the trade-off between bias and variance).This is the bias/variance trade-off to a statistician (Jeremy has given Friday links to posts on this topic by Gelman).

Bias and Variance - Here estimator A is biased (average guess is off the true value) but it has low variance. Estimator B has zero bias (average guess is exactly on the true value) but the variance is larger. In such a case Estimator B can (and in this example does) have a larger Mean Squared Error (MSE) - a metric of overall goodness of an estimator.

Figure 1 – Bias and Variance - Here estimator A is biased (average guess is off the true value) but it has low variance. Estimator B has zero bias (average guess is exactly on the true value) but the variance is larger. In such a case Estimator B can (and in this example does) have a larger Mean Squared Error (MSE) – a metric of overall goodness of an estimator. This can happen because MSE depends on both bias and variance – specifically MSE=Bias^2+Variance.

This is exactly why the WLD ignore detection probabilities method (which GLMWM somewhat disparagingly call the naive method) can have a lower Mean Square Error (MSE) than using detection probabilities despite always being biased (starting from behind if you will). Detection probabilities have zero bias and non-detection methods have bias, but in some scenarios, non-detection methods have so much lower variance than detection methods that the overall MSE is better to ignore the detection method. Not so naive after all! Or in other words, being unbiased isn’t everything. Having low variance (known in statistics as an efficient estimator) is also important. Both the bias of ignoring detection probabilities (labelled “naive” by GLMWM) and the higher variances of the detection methods can easily be seen in Figures 2 and 3 of GLMWM.

When does ignoring detection probabilities give a lower MSE than using them?

OK – so we dove into enough estimation theory to understand that both WLD and GLMWM are correct in the scenarios they chose (and that the authors of both papers were probably smart enough to pick in advance a scenario that would make their side look good). Where does this leave the question most readers will care about most – “should I use detection probabilities or not?”  Well the appendix to GLMWM is actually exceptionally useful (although it would have been more useful if they bothered to discuss it!) – specifically supplemental material tables S2.1 and S2.2.

Let’s start with S2.1. This shows the MSE (remember low is good) of the ignore detection model in the top half and the MSE of the use the deteciton model in the bottom half for different samples sizes S, repeat visits K, and values of Ψ and p. They color code the cases red when ignore beats use detection, and green when detection beats ignore (and no color when they are too close to call). Many of the differences are small, but some are gigantic in either direction (e.g. for Ψ=0.2, p=0.2, ignoring detection has an MSE of 0.025 – a really accurate estimator – while using detection probabilities has an MSE of 0.536 – a really bad estimate given Ψ ranges only from 0-1, but similar discrepancies can be found in the opposite direction too). The first thing to note is that at smaller sample sizes the red, green and no color regions are all pretty equal! IE ignoring or using detection probabilities is a tossup! Flip a coin!  But we can do better than that. When Ψ (occupancy) is < 50% ignore wins, when Ψ>50%, use detection wins, and when p (detection rate) is high, say>60% then it doesn’t matter. In short, the contrasting results between WLD and GLMWM are general! Going a little further, we can see that when sample sizes (S but especially number of repeat visits K) creep up, then using detection probabilities starts to win much more often which also makes sense – more complicated models always win when you have enough data, but don’t necessarily (and here don’t) win when you don’t have enough data.

Bias, Variance and Confidence Intervals

Figure 2 – Figure 1 with confidence intervals added

Now lets look at table S2.2. This is looking at something that we haven’t talked about yet. Namely, most estimators have, for a given set of data, a guess about how much variance they have. This is basically the confidence interval in Figure 2. In Figure 2, Estimator A is a better estimator of the true value (it is biased, but the variance is low so MSE is much lower), but Estimator A is over confident – it reports a confidence interval (estimate of variance) that is much smaller than reality. Estimator B is a worse estimator, but it is at least honest – it has really large variance and it reports a really large confidence interval. Table S2.2 in GLMWM shows that ignoring detection probabilities is often too cocky – the reported confidence intervals are too small (which has nothing to do with and in no way changes that ignoring detection probabilities is in many case still a better or equally good estimator of the mean – the conclusion from table S2.1). But using detection probabilities is just right – not too cocky, not too pessimistic – it’s confidence intervals are very accurate – when there’s a lot of variance, it knows it! In short Figure 2  is a good representation of reality over a large chunk of parameter space where method A is ignore detection (and has lower MSE on the estimate for Ψ but over-confident confidence intervals) and method B is use detection-based methods (and has worse MSE for the estimation of Ψ but has very accurate confidence intervals)..

(As a side-note, this closely parallels the situation for ignoring vs statistically treating spatial, temporal and phylogenetic autocorrelation. In that case both estimators are unbiased . In principal the variance of the methods treating autocorrelation should be lower, although in practice they can have larger variance when bad estimates of autocorrrelation occur so they are both roughly equally good estimators of the regression coefficients. But the methods ignoring autocorrelation are always over-confident – their reported confidence intervals are too small.)

So which is better – a low MSE (metric of how good at guessing the mean) or an honest, not cocky estimator that tells you when its got big error bars? Well in some regions you don’t have to choose  using detection probabilities is a better estimator of the mean by MSE and you get good confidence intervals. But in other regions – especially when Ψ and p are low you have to pick – there is a tradeoff – more honesty gets you worse estimates of the occupancy. Ouch! That’s statistics for you. No easy obvious choice. You have to think! You have to reject statistical machismo!

Summary and recommendations

Let me summarize four facts that emerge across the WLD and GLMWM papers:

  1. Ignoring detection probabilities (sensu WLD) can give an estimate of occupancy that is better (1/3 of parameter space), as good as (1/3 of parameter space) or worse than (1/3 of parameter space) estimates using hierarchical detection probability models in terms of estimating the actual occupancy. Specifically, ignoring detection guarantees bias, but may result in sufficiently reduced variance to give an improved MSE.These results come from well-known proponents of using detection probabilities using a well-known package (unmarked in R), so they’re hard to argue with. More precisely, ignoring detection works best when Ψ is low (<50%) and p is low, using detection works best when Ψ is high (>50%) and p is low, and both work very well (and roughly equally well) when p is high (roughly when p>50% and certainly when p>80%) rgardless of Ψ.
  2. Ignoring detection probabilities leads to overconfidence (reported confidence intervals that are too small) except when p is high (say >70%). This is a statement about confidence intervals. It does not affect the actual point estimate of occupancy which is described by #1 above.
  3. As data size gets very large (e.g. 4-5 repeat visits of 165 sites) detection probability models general get noticeably better – the results in #1 mostly apply at smaller, but in my opinion more typically found, sample sizes (55 sites, 2 repeat visits).

And one thing talked about a lot which we don’t really know yet:

  1. Both WLD and GLMWM talk about whether working with detection probabilities requires larger samples than ignoring detection probabilities. Ignoring detection probabilities allows  Ψ to be estimated with only single visits to a site, while hierarchical detection probabilities requires a minimum of 2 and as GLMWM shows really shines most with 3 or 4 repeat visits. To keep a level playing field both WLD and GLMWM reports results where the non-detection approach uses the repeat visits too (it just makes less use of the information by collapsing all visits into either species seen at least once or never seen). Otherwise you would be comparing a model with more data to a model with less data which isn’t fair. However, nobody has really full evaluated the real trade-off – 50 sites visited 3 times with detection probabilities vs 150 sites visited once with no detection probabilities. And in particular nobody has really visited this in a general way across the whole parameter space  for the real-world case where the interest is not in estimating  Ψ, the occupancy, but the β’s or coefficients in a logistic regression of how Ψ varies with environmental covariates (like vegetation height, food abundance, predator abundance, degree of human impact, etc). My intuition tells me that with 4-5 covariates that are realistically covarying (e.g. correlations of 0.3-0.7) getting 150 independent measures of the covariates will outweigh the benefits of 3 replicates of 50 sites (again especially for accurate estimation of the β’s) but to my knowledge this has never been measured. The question of whether estimating detection probabilities requires more data (site visits) remains unaswered by WLD and GLMWM but badly needs to be answered (hint: free paper idea here).

So with these 3 facts and one fact remaining unknown, what can we say?

  1. Detection probabilities are not an uber method that strictly dominates ignoring them. As first found by WLD and now clearly shown to be general in the appendices of GLMWM, there are fairly large regions of parameter space where the primary focus – the estimate of Ψ – is more accurate if one ignores detection probabilities! This is news the detection probably machismo-ists probably don’t want you to know (which could be an explanation for why  it is never discussed in GLMWM).
  2. Detection probabilities clearly give better estimates of their certainty (or in a lot of cases uncertainty) – i.e. the variance of the estimates.
  3. If you’re designing data collection (i.e. estimating # of sites vs # visits/site before you’ve taken measurements – e.g. visit 150 sites once or 50 sites 3 times), I would recommend something like the following decision tree:
    1. Do you care more about the estimate of error (confidence intervals)  than the error the estimate (accuracy of Ψ)? If yes then use detection probabilities (unless p is high).
    2. If you care more about accuracy of Ψ, do you have a pretty good guess that Ψ much less or much greater than 50% or that p is much greater than 70%? If so then you should use detection probabilities if Ψ is much greater than 50% and p less than or equal to 50-60%, but ignore them if Ψ much less than 50% or p clearly greater than 50-60%.
    3. If you care more about accuracy of Ψ and don’t have a good idea in advance of roughly what Ψ or p will be, then you have really entered a zone of judgement call where you have to weigh the benefits of more sites visited vs. more repeat visits (or hope somebody answers my question #4 above soon!).
    4. And always, always if you’re interested in abundance or species richness, don’t let somebody bully you into switching over to occupancy because of the “superiority” of detection models (which as we’ve seen is not even always superior at occupancy). Both the abundance and species richness fields have other well established methods (e.g. indices of abundance, rarefaction and extrapolation) for dealing with non-detection.
    5. Similarly, if you have a fantastic dataset (e.g. a long term monitoring dataset) set up before detection probabilities became fashionable (i.e. no repeat visits) don’t let the enormous benefits of long term (and perhaps large spatial scale) data get lost just because you can’t use detection probabilities. As we’ve seen detection probabilities are (a good method, but also a flawed method which is clearly outperformed in some cases just like every other method in statistics. They are not so perfect that they mandate throwing away good data.

The debate over detection probabilities have generated a lot more heat and smoke than light, and there are clearly some very machismo types out there, but I feel like if you read carefully between the lines and into the appendices, we have learned some things about when to use detection probabilities and when not to. The question #4 still remains a major open question just begging for a truly balanced, even-handed assessment. What do you think? Do you use detection probabilities in your work? Do you use them because you think they’re a good idea or because you fear you can’t get your paper published without them? Has your opinion changed with this blog?

 


*I’m aware there are other kinds of detection probabilities (e.g. distance based) and that what I’m really talking about here are hierarchical detection probabilities – I’m just trying to keep the terminology from getting too thick.

**Although I have to say I found it very ironic that the software code GLMWM provided in an appendix, which uses the R package unmarked, arguably the dominant detection probability estimation software,  apparently had enough problems finding optima that they rerun each estimation problem 10 times from different starting points – a pretty sure sign that optima are not easy to find.

25 years of ecology – what’s changed?

I am giving/gave a talk this morning at a Festschrift celebrating the 25th anniversary of the Graduate Ecology program at the Universidade Federal de Minas Gerais (UFMG), the large state university in one of the larger states/cities in Brazil. So first congratulations to the program and many thinks to the organizers (especially Marco Mello, Adriano Paglia and Geraldo Fernandes) for inviting and hosting me.

I was invited to give the talk based on my blogging, which is sort of a new trendy thing in ecology. So I foolishly offered to give a perspective on the past 25 years of ecology and what the next 25 years of ecology will contain, because I like to think about such things. But as I prepared my slides I increasingly got nervous because these are topics no one person should claim expertise on!

However, I did come up with a couple of data-driven graphics that I thought readers might find interesting.

Publication trends

First I did some statistics on rates of publishing by country (using Web of Science so biased to English journals). I picked out the US, several European Countries and Brazil and China. What would you guess the trends are? First, the total # of papers published per decade is increasing at a phenomenal rate, so everybody is publishing more. But as a percent of published papers, most European countries are holding steady (although some countries like Germany started to publish in English later than other countries like Sweden so they show a big increase in the 1980s or 1990s), the US is slowly declining and China and Brazil are increasing rapidly.

Total ecology papers published per decade

 

According to Web of Science which is English journal-biased. RoW is rest of world.

According to Web of Science which is English journal-biased. RoW is rest of world.

 

Research topic trends

Secondly, and more interesting to me, I did a Wordle on the titles of the top 200 cited papers in 1989 and the top 200 cited papers in 2012 (yes it is 2014 but I found I had to go back to 2012 to get citations that had settled down to papers that were truly the top instead of just the ones published in January).

The two Wordles are for 1989:

 

Word cloud for titles of top 200 cited papers in 1989 (click for a full size image)

Word cloud for titles of top 200 cited papers in 1989 (click for a full size image)

And 2012:

Top 200 for 2012

Top 200 for 2012 (click for full size image)

There are some obvious differences. But before I comment, I am curious to see what you all see (that is the point of a word cloud after all)  I hope you all will share your thoughts on what has or has not changed in 25 years (OK 23). I’ll try and add my thoughts in the comments after others have had a chance at analysis.

 

PS – if you’re curious you can download my slides for the talk from figshare. The first 1/3 matches what you read above. The last 2/3 mostly matches themes I’ve hit on before here in my posts on DE. Although students might enjoy the next to last slide on advice to students.