About Brian McGill

I am a macroecologist at the University of Maine. I study how human-caused global change (especially global warming and land cover change) affect communities, biodiversity and our global ecology.

Why AIC appeals to ecologist’s lowest instincts

It is my sense of the field that AIC (Akaike information criteria) has moved past bandwagon status into a fundamental and still increasingly used paradigm in how ecologists do statistics. For some quick and dirty evidence I looked at how often different core words were used at least once in an article in Ecology Letters in 2004 and 2014. Regression was used in 41% and 46% respectively. Significance was used in 40% and 35%. Richness was 41% and 33%. And competition was 46% and 49%. Perhaps a trend or two in there but all pretty steady. AIC has gone from being in 6% of the articles in 2004 to 19% of the articles in 2014. So in summary – AIC has tripled in usage and is now found in 20% of all articles and is used almost 2/3 as often as the mostly widely used statistical technique of significance..

I have a theory about why this has happened which does not reflect favorably on how AIC is used. Please note the qualification “how AIC is used”. AIC is a perfectly valid tool. And like so many tools, its original proponents made reasonable and accurate claims about it. But over time, the community takes ownership of a concept and uses it how they want, not how it was intended.

And I would suggest how people want to use AIC is in ways that appeal to two low instincts of ecologists (and all humans for that matter). First humans love rankings. Most newspapers contain the standings of all the teams in your favorite sport every day. We pay more attention of the rankings of a journal’s impact factor than its absolute value. Any number of newspapers produce rankings of universities. It is ridiculous to think that something as complex as journal quality or university quality can be reduced to one dimension (which is implicit in ranking – you can’t rank in two dimensions). But we force it on systems all the time. Second, humans like to have our cake and eat it too. Statistics have multiple modalities or goals. These include: estimation of parameters, testing of hypotheses, exploration of covariation, prediction into new conditions, selecting among choices (e.g. models) etc. Conventional wisdom is you need to be clearly based in one goal for an analysis. But we hate to commit.

You can probably already see where I’m headed. The primary essence of what AIC delivers is to boil choices down to a single dimension (precisely it provides one specific weighting of the two dimensions of likelihood and number of parameters to give a single dimension) and then ranks models. And comparing AIC scores is so squishy. It manages to look like all 5 statistical goals at once. It certainly does selection (that is its claim to fame). But if you’ve ever assessed whether ΔAIC>2 you have done something that is mathematically close to p>0.05.

Just to be clear, likelihood also can be used towards all those goals. But they present much more divergent paths. If you’re doing hypothesis testing you’re doing likelihood ratios. If you’re doing estimation you’re maximizing. If you’re doing selection you can’t proceed unless you specify what criteria to use in addition to likelihood. You have to actually slow down and choose what mode of inference you’re doing. And you have to make more choices. With AIC you present that classic table of ΔAIC and weights and voila! You’ve sort of implied doing all five statistical goals at once.

I want to return to my qualification of “how AIC is used”. The following is a simple example to illustrate how I perceive AIC being used these days. Take the example of species richness (hereafter S). Some people think that productivity is a good predictor (hereafter prod). Some people think seasonality is a better predictor (hereafter seas). Some people suggest energy is the true cause (hereafter energ). And most people recognize that you probably need to control for area sampled (area).Now you could do full blown variable selection where you try all 16 models of every possible combination of the four variables and using AIC to pick the best. That would be a pretty defensible example of exploratory statistics. You could also do a similarly goaled analysis of variable importance by scaling all four variables and throwing them into one model and comparing coefficients or doing some form of variance partitioning. These would also be true exploratory statistics. You could also use AIC to do variable importance ranking (compare AIC of S~prod, S~seas, S~energ). This is at least close to what Burnham and Anderson suggested in comparing models. You could even throw in S~area at which point you would basically be doing hypothesis testing vs a null although few would acknowledge this. But my sense is that what most people do is some flavor of what Crawley and Zuur advocate which is a fairly loose mix of model selection and variable seleciton. This might result in a table that looks like this*:

Model ΔAIC weight
S~prod+seas+area 0 31%
S~prod+energ+area 0.5 22%
S~prod+energ 1.1 15%
S~energ+seas 3.2 9%
S~energ 5.0 2%

There are a couple of key aspects of this approach. It seems to be blending model selection and variable selection (indeed it is not really clear that there are distinct models to select from here, but it is not a very clear headed variable selection approach either). Its a shame nobody ever competes genuinely distinct models with AIC as that was one of the original cliams to the benefit of AIC (e.g. Wright’s area energy hypothesis S~energ*area vs.the more individuals hypothesis a SEM with two equations:  S~numindiv and numindiv~prod). But I don’t encounter it too often. Also note that more complicated models came out ranked better (a near universal feature of AIC). And I doubt anybody could tell me how science has advanced from producing this table.

Which brings me to the nub of my complaint against AIC. AIC as practiced is appealing to base human instincts to rank and to be wishy washy about inferential frameworks.There is NO philosophy of science that says ranking models is important. Its barely better than useless to science. And there is no philosophy of science that says you don’t have to be clear what your goal is.

There is plenty of good debate to have about which inferential approach advances science the best (a lot has happened on this blog!). I am partial to Lakatos and his idea of risky predictions (e.g. here). Jeremy is partial to Mayo’s severe tests which often favors hypothesis testing done well (e.g. here). And I’ve argued before there are times in science when exploratory statistics are really important (here). Many ecologists are enamored with Platt’s strong inference (two posts on this) where you compare models and decisively select one. Burnham and Anderson cite Platt frequently as an advantage of AIC. But it is key to note that Platt argued for decisive tests where only one theory survives. And arguably still the most mainstream view in ecology is Popperian falsification and hypothesis testing. I can have a good conversation with proponents of any of these approaches (and indeed can argue for any of these approaches as advancing science). But nowhere in any of these approaches does it say keeping all theories around but ranking them is helpful. And nowhere does it say having a muddled view of your inferential approach is helpful. That’s because these two practices are not helpful. They’re incredibly detrimental to the advance of science! Yet I believe that AIC has been adopted precisely because they rank without going all the way to eliminating theories and because they let you have a muddled approach to inference.

What do you think? Has AIC been good for the advance of science (and ecology). Am I too cynical about why hordes are embracing AIC? Would the world be better off if only we went back to using AIC as intended (if so how was it intended)?

UPDATE – just wanted to say be sure to read the comments. I know a lot of readers usually skip them. But there has been an amazing discussion with over 100 comments down below. I’ve learned a lot. Be sure to read them.

*NB this table is made up. In particular I haven’t run the ΔAIC through the formula to get weights. And the weights don’t add to 100%. I just wanted to show the type of output produced.

Mistakes happen in science

Meg recently wrote a post acknowledging that crying in science was pretty common place. It really touched a nerve and went viral. Meg’s opening syllogism was masterful: humans cry, scientists are human, therefore scientists will cry.

I want to touch on an even more sensitive syllogism: humans make mistakes, scientists are human, therefore scientists will make mistakes. And a corollary – some mistakes will make it into print.

People obsessed with preserving a united front against science deniers might try to pretend this isn’t true. But it is true. This rarely acknowledged truth about scientists is fresh in everybody’s minds because of a recent retraction of an ecology paper (due to an honest mistake). I’m not even going to link to it since it is a distraction from my main point to single out one group of individuals when I’m talking about collective responsibility (but if its too distracting not to know Jeremy linked to it on Friday).

What I am finding revealing is not that a retraction occurred but other people’s reactions to the fact that a retraction occurred. There seems to be a lot of distancing and blaming. The first commentor on retraction watch even went one step further and very sloppily and inaccurately started throwing around the phrase “fraud scandal” (really? the topic of mistake is so taboo we can’t differentiate the profound difference between mistake and fraud?)

My reactions were rather different, going in order of occurrence, and probably progressively more profoundly were:

  1. Ouch – I feel bad for the authors
  2. I’m impressed with the way the authors handled this – it took a lot of courage
  3. That’s science working the way it is supposed to
  4. It could have been me

There’s no need to expand on the first one (except its worth noting I don’t know any of the author’s personally so this was more of a 1 degree removed member of my community form of empathy).

But I think it is worth dwelling on the second one for a moment. It must have been very tempting to bluster and deny that there were substantive enough mistakes to require a retraction and hoped this faded away. We all know this strategy has a decent shot at working. In an infamous case in evolution (UPDATE the link in Jeremy’s post is broken – follow this link), it worked for years until a co-author took it upon himself to self-publish and blow the whistle (nobody talks about this but the journals have an obvious interest in not highlighting a mistake). But these author’s didn’t weasel in any fashion. And they thought about the good of science before the good of their careers. Good for them!

As for the 3rd reaction – this is not a failure of science. It is a success of science! It is science working as it is supposed to. And it is exactly why science has a claim to a degree of rigor that other modes of thought don’t have. The reason my syllogism doesn’t eliminate science as a paragon of correctness is that – contrary to the popular view about lone geniuses  – science is not about individuals or single papers. It is about the community and the total body of evidence. One individual can be right, wrong, a crack-pot, a genius, mistaken, right for the wrong reasons, and etc. But the community as a whole (given time) checks each other and identifies wrong ideas and mistakes. The hive mind will get the important things right with some time. If you read the details, this is exactly what happened. Good for science!

The last reaction is the touchiest of all (it could have been me*). Of course I do not knowingly have any mistakes in print. But I could have a mistake out there I don’t know about. And I’ve caught some that came close. And I could make one in the future. Should I be thinking that? Should I be admitting that in a public blog? I sure hope your answer to both of these questions is yes. If I’m not asking the first quesiton (and admitting the possibility) how can I be putting my best effort into avoiding mistakes. The same for the community context. And I’m pretty sure any other honest scientist cannot say they are 100% sure they never had made a mistake and never will make a mistake. 95% sure – I hope so. Maybe even 99% sure. But 100% sure? I don’t trust you if that is what you claim. Every lab I’ve ever worked in or been close to (meaning dozens) have challenges and errors with data and coding and replicability of analysis. Most of them are discovered and fixed (or sadly prevent publication). But has anybody here ever run an analysis, gotten a particular t-statistic/p-value and written it up, and then run the analysis later and gotten a slightly different number and never been able to recreate the original? Anybody have one or two sample IDs that got lost in the shuffle and you don’t know what they are? These are admittedly small mistakes that probably didn’t change the outcome. But it is only a difference of degree. And I bet most of you know of bigger mistakes that almost got out the door.

I want to speak for a minute more specifically about coding. In this day and age nearly every paper has some coding behind it. It might just be an R script to run the analyses (and probably dropping some rows with incomplete data etc along the way). But it might be like the stuff that goes on in my lab including 1000+ line computer simulations and 1000+ line big data analysis. Software engineers have done a lot of formal analysis of coding errors. And to summarize a lot of literature, they are numerous and the best we can do is move asymptotically towards eliminating them. Getting rid of even 90-95% of the errors takes a lot of work..Even in highly structured anti-error environments like NASA or the medical field mistakes slip through (like the mis-transcribed formula that caused a rocket to crash). And science is anything but a highly-structured anti-error environment (and we shouldn’t be – our orientation is on innovation). In a future post, I will go through some of the tricks I use to validate and have faith in my code.. But that would be a distraction here (so you might want to save your comments on how you do it for that post too). The bottom line though is I know enough software engineering not to fool myself. I know there are errors in my code. I’ve caught a couple of one line mistakes that totally changed the results while I was in the middle of writing up my first draft.  I think and hope that the remaining errors are small. But I could be wrong. And if I am wrong and made a whopping mistake, I hope you find my mistake!

The software industry’s effort at studying errors was just mentioned. But the medical and airline industries have recently devoted a lot of attention to the topic of mistakes as well (their mistakes are often fatal).The Institute of Medicine released a report entitled “To Err is Hman” with this telling quote:

“.. the majority of medical errors do not result from individual recklessness or the actions of a particular group–this is not a “bad apple” problem. More commonly, errors are caused by faulty systems, processes, and conditions that lead people to make mistakes or fail to prevent them.”

Broad brushing the details, both medicine and the airlines have come to the conclusion that the best way to avoid mistakes are to 1) destroy the myth of infallibility, 2) eliminate the notion that raising the possibility of a mistake is offensive, 3) introduce a culture of regularly talking about the possibility of mistakes and analyzing mistakes made for lessons learned, and 4) make avoiding mistakes a collective group responsibility.

I think arguably science figured this all out a couple of hundred years ago. But it is worth making explicit again. And per #3 it is worth continuously re-evaluating how we’re doing. In particular we do #4 extremely well. We have peer review, post-publication review (which is stronger for prominent and surprising results), attempts at replication etc. We’re professional skeptics. We also do pretty well at #2; you expect and accept your work being criticized and picked apart (even if nobody enjoys it!). #1 is more of a mixed bag. I’ve heard a lot of “it could never happen in my lab” comments recently, which is exactly the myth of infallibility. And the same for #3 – I haven’t yet heard anybody say “I’m going to change X in my lab” in response to the recent incident. And more generally across #1-#4, I would suggest that coding is novel enough in ecology that we have not yet fully developed a robust set of community practices around preventing coding errors.

In conclusion, I am sure somebody is going to say I am glorifying mistakes in science. I’m not. Mistakes* are unfortunate and we all need to (and I think all do) put a lot of effort into avoiding them. But I sincerely believe there is no way to guarantee individual scientists do not make mistakes. At the same time, I also sincerely believe that a well constructed scientific community is robust enough to find and correct all important mistakes over time. Which means it really matters whether we respond to mistakes by finger pointing or examining our common culture and how to improve it. The later is the conversation I want to have.


*Probably important to reiterate here that I’m talking about mistakes, not fraud. Whole different kettle of fish. I presume most people can see that, which is why I am not belaboring it.

A curmudgeon’s musings on modern pedagogy

(warning this is long – you can skip to the conclusions or even bottom-bottom line at the end if you want)

I am not an expert on pedagogical methods. But I have been on the teacher side of university education for almost 20 years. And I’ve literally taught 100, 200, 300, 400, 500 and 600 level classes. I’ve taught classes ranging from 630 students to 3. Math-oriented to field-based. In short a pretty typical mid-career teaching history. And about 8 years ago, I took over a 600+ student intro bio class (basically BIO 100) and spent a lot of time thinking about goals which led to my introducing clickers which led to my basically being the lead academic (working with the campus learning center) leading clicker introduction in basic science classes across campus. And I was a TA in a class before and after introduction of active learning. (my most recent experience with changing pedagogy in a class is discussed below) So I’ve formed a few opinions along the way.

I am by no means at a settled state of where I think university education should go. But the following are a few thoughts and musings. (NB Meg has a series of good posts on this topic as well: here   here and Friday links here  and Terry has a bunch of good posts over at Small Pond here and here).

Point #1- Buzzword blur – we tend to just lump all the trends together but they are not the same. You can do one without the other. (And there are distinct goals and rationales in each case). Here is a quick tour

  • Active learning – activities in which the students are not just passively listening but actively producing knowledge via inquiry, answering questions, discussing, etc. This was one of the earliest movements (in ascendancy in late 90s).
  • Peer instruction – a model in which students teach each other. Often students are given a question and then discuss the answer with their peer students. This draws on research showing most people learn better in a social context. When tested via before & after versions of the same question using clickers I am astonished at the improvement (often 10% right to 95% right).
  • Flipped classroom – the buzzword du jour – this starts from the notion that lecturing is a relic from the days when textbooks were rare (hand copied). Flipping means students do passive learning (reading, watching lectures) at home on their own schedule, and then uses the classroom with the instructor present to do something more active where the instructor can intervene and assist. This can be as simple as having students do what used to be their homework now done in class and raise their hand for help to much newer approaches like peer instruction.
  • Just-in-Time-Teaching – the notion that the teacher will dynamically adapt the material being taught based on real-time feedback on what students are not understanding. This implies an ability to reteach material in a new way. It also implies real time feedback either from quizzes just before class or some in class feedback mechanism (clickers, hands raised) or although nobody talks about it old-fashioned sensitivity to puzzled looks on students faces.
  • Inquiry based learning/Investigative learning – instead of teaching material, giving students problems (specifically non-trivial problems) to solve. The teachers role is as a facilitator to help students discover first the process they need to use then the answer to the questions themselves.

Point #2 – Clickers – clickers are just a tool – they can be used for any of the above techniques or for purposes not listed above. At one end clickers can be used to pose simple multiple choice questions and then reward or penalize based on attendance (there is a difference and both are possible) Clickers can also be used in peer instruction (get clicker answers, show what everybody answered, discuss for 2 minutes with peers, then revote – amazing improvement occurs)  Clickers can also be an important tool in just-in-time-teaching if the teacher is flexible enough (i.e they’re a great way to find out if the students really understand what you just taught if you’re brave enough to deal with a no they didn’t answer). Generally one should only expect as much out of clickers as one puts into them. And clickers have real issues about cost – old fashioned methods like hand raising can do many of the same things (although its harder to force 100% participation). Honestly, I think the single biggest value of clickers is to serve as a disruptor and force you to think about how and why you teach. And if you don’t do that thinking, then clickers aren’t doing much.

Point #3 – Remembering why we are doing this – Although often not made explicit the goal of most of the techniques listed in Point 1 is to elevate learning up Bloom’s taxonomy. If this is not the goal, then such techniques are not necessarily the best approach. Bloom’s taxonomy was formulated in three domains: cognitive, emotional & physical, but the most talked about and the relevant one here is the cognitive. This recognizes the simple idea that there are different levels of learning starting with knowledge (memorize facts), then comprehension (extrapolate/understand), then analysis (using knowledge) then synthesis then evaluation. The last sentence is immensely oversimplified of course. But this is the central motivation of all of these techniques: to elevate learning up the taxonomy. Much of the origin of these techniques started in physics when people realized students were memorizing formulas enough to plug and chug on tests, but had major failures in basic intuition about how physics works. So they began teaching to develop higher level mastery.

Learning higher up on the taxonomy is obviously a good thing. But the thing I never hear anybody discuss is that it is part of an inherent trade-off. It is essentially a depth vs breadth trade-off. Any realistic application of active learning etc techniques to elevate learning involves covering less material. Covering better, but covering less. Are there times and places in university courses to cover the breadth rather than the depth? I struggled with this question a lot teaching intro bio. The breadth expected of that course from higher level courses, and indeed the breadth of life gives a strong demand in the breadth direction. But to cover it meant giving up on deeper understanding of higher level concepts like homoplasy. Which is more important: a) truly understanding homoplasy rather than just being able to regurgitate a definition of homoplasy (e.g. being able to produce new examples of homoplasy which would probably be the applying or 3rd level of Bloom’s taxonomy) or b) remembering platyhelminthes and their acoelemate architecture and basal position (level 1 or remembering)? Maybe some of you out there are such fantastic teachers you can achieve both in a semester. But in my experience this trade-off is very real (not on just these two exact topics of course but on these two levels of learning across all of the material to cover in an intro bio class). I never did fully decide what I thought about this and I’d be curious to hear what others say. But I do strongly believe there is a trade-off between breadth and depth (moving up the taxonomy).that is not talked about enough.

Point #4 – Notetaking – I find it ironic that in this day and age of focus on active learning and moving up the taxonomy, teachers have largely capitulated on giving students copies of powerpoint slides and eliminating a very effective method for doing real-time active learning while listening to lectures (with many studies showing that note taking is a very effective learning method). And nobody is calling this out.

Point #5 – You can go halfway (or 10%) in – It seems to me the conversation is very binary. All-in flipped/active learning/peer instruction 100% of the time or boring old traditional. This is totally bogus. If active learning has value, then one five minute exercise per hour (or even every other class) has value. And practically, it is very possible to choose anywhere on the spectrum from 0% to 100% flipped/active. This is also my reason for being pedantic and breaking apart the ideas in point #1. One can flip without inquiry based,  do active learning without just-in-time, etc.

Point #6 – This is not new – Another thing that is not discussed very often is that these techniques are hardly new (but see this link and commentary of Terry’s). Socrates was demanding productive/active learning using inquiry based techniques and peer instruction 2500 years ago. And many teachers have been doing the same for decades (and millenia).

Point #7 – How hard is it to do? – You can find various opinions about how much work it is to flip a class room (see Meg here and Terry here). My main experience was also the first time I taught the class so it is hard to separate the two. I don’t think I have an informed opinion. But I do think that for those of us raised in the traditional lecture mode, it can take more creativity and emotional energy to do something new and different.

Point #8 – Does it work? – My sense of the overall empirical literature on how effective these techniques is that the answer is complex, which matches my own experiences. There is a lot of evidence that active learning etc approaches match what we know from cognitive psychology about how we learn best, but this is indirect evidence for superior learning occurring. Students on average also enjoy these techniques. This is also indirect evidence (but very relevant in its own right). More directly, studies show statistically significant improvements in level of learning with active approaches but the pedagogical significance is tougher to assess. A good recent metanalysis is Freeman et al They show one half standard deviation improvement which amounts to about 6 points out of 100 improvement (less on traditional exams, more on higher level learning concept inventories).  But there are a lot of issues with these studies (e.g. are more motivated teachers more likely to adopt active learning techniques but succeed primarily because of the motivation not the method – or are they likely to teach better because the change in technique is forcing new energy and attention to teaching regardless of technique).

My own experience with a partial commitment to such techniques in the BIO 100 course is that the students scored exactly the same average (and I mean to 2 significant digits) on the final exam as they did in the earlier version of the course. It was a a rewritten exam, and I would like to argue that it was testing further up the taxonomy. But this was not formally measured. And it wasn’t miles up the taxonomy (it was still multiple choice for goodness sake). My overall impression is that there is an improvement in “learning” (hard as that is to define and measure) but it is not stunning or even by obvious amounts (i.e. I would have to use statistics to tease apart the improvements) .Its certainly not like every student is suddenly moving up a grade (e.g. B to A) in class or anything. Freeman suggests 4-5 points on traditional exams which might be a B- to a B. This still sounds a little high compared to the experiences I know of but not outrageously high. But I am more confident (based on experience and literature) that students are enjoying things more, paying attention more, and probably developing a slightly more sophisticated understanding. And that is nothing to sneeze at.

My most recent personal experience with pedagogy reform

This year I abandoned powerpoint (except for occasional graphs and pictures) and did a lot of chalk boarding but in the end you would have to say they were “traditional” lecture classes (in fact really old school lectures without visual aids except the chalk board). But the students took lots of notes (no powerpoints to give). And I spent a lot of time asking and being asked questions (there were <20 students so everybody was involved). Indeed, despite always making dialogue during class a top priority, a lot more happened this year – somehow powerpoint seems to introduce a wall and turns the goal into finishing the slides instead of teaching/learning. I did some peer instruction and also just giving ecological math problems to do individually in class, but most of it was more in the vein of Socratic inquiry (i.e. teacher asking a question and getting multiple responses back). So I wasn’t following too many of the buzzwords, but it felt like a much improved class to me. Was this good pedagogy or bad? NB: I am officially combining point #4 with this experience to launch a new pedagogical movement called “PowerPoint is evil”. If this takes off, you heard it here first! But then again, its possible that getting rid of the powerpoint was just the disruptor (as mentioned above with clickers) that made me pay more attention to my teaching and five years from now adding PowerPoint back in will improve my teaching. (UPDATE – just found this lovely piece on teaching with chalk by Chris Buddle)

Point #9 – Class size – Thinking about class sizes above raises another big point – one that I’m sure administrations won’t like. But how much can pedagogy innovation do to fundamentally change learning (or lack thereof) in a classroom of 600 (or 300 or even 100) students? Teaching a class of 15 students effectively is pretty easy.Teaching a class of 300 effectively is impossible no matter what. The aforementioned meta-analysis by Freeman showed pretty clearly that active learning is most effective in classes with <50 students and decreases in effectiveness pretty quickly in larger classes. Is pedagogy improvement just a giant distraction from the real issue?

Conclusions

Overall, I think the emphasis on pedagogical methods is fantastic (and largely unprecedented in higher ed – most previous reform movements have focused on curricular reform). And I do think there is something real and of value in the active learning movement.But its not ginormous. And I also think we have gotten overly simplistic, reducing teaching to a one-dimensional bad (traditional) vs good (100% active learning) axis. The reality is that even the concept of learning is multidimensional (with the Bloom taxonomy being but a single dimension) and that pro-con trade-offs exist on all of these dimensions. This makes it impossible to to say what the “best” teaching method is without specifying the learning goal. In practice, I think we are better off to think of the traditional vs active/flipped axis as a dial we should tune depending on the situation and goals. And this dial has positions everywhere in between 0 and 100. And it is not one-dimensional it has multiple dimensions including 0-100% flipped, 0-100% just-in-time, 0-100% peer instruction, 0-100% inquiry based learning independent of each other and etc. And, although I haven’t fully worked it out for myself, I believe in some contexts breadth is a more important goal than higher taxonomy learning. We don’t have a set of best practices for breadth-oriented learning yet, but I wish we did.

One big thing I hope comes out of all of this is that we spend a lot more time in our departments and among colleagues having discussions about what our learning goals are (and no I don’t mean the kind my university requires me to list on my syllabus under the heading goals that are just lists of topics covered). I mean talking about how far up the taxonomy should a class go. What breadth is necessary and appropriate in this class to set up future years. Which classes are appropriate for different kinds of learning. Perhaps ecology and genetics should focus on high level learning and BIO 100 should focus on memorizing the phyla of life? Or maybe not? How important is a 6 point increase on an exam (and maybe half of that in a large class)? Would we be better off scrapping exams and lectures and active learning and putting them in hands-on labs? or taking ecology students out in the field to design their own experiments? Recall that there are finite resources so there are trade-offs and limits. How can we measure and assess whether we are succeeding? We need to start having discussions about pedagogical goals in departments. Logically that should proceed decisions about classroom pedagogical methods, but I’m not sure this is how things have happened.

Bottom bottom line – Modern pedagogy (=active learning/flipped class/etc) is not a silver bullet and it should not become the good end of a one-dimensional value judgement (flipped=good, not flipped=bad teaching). But these techniques definitely have some benefits. There are probably other issues we should be talking about equally much ranging from the simple like the declining art of notetaking to the difficult like class sizes. And maybe just mixing up our teaching approach periodically is more important than any specific technique. More broadly we need to think deeply and discuss regularly about our pedagogical goals, especially depth vs breadth, and the best ways to get there.

What are your experiences with the modern pedagogy movement? Has flipping classrooms become a bandwagon? Is this a good thing or a bad thing? Is there a breadth vs depth (=up the taxonomy) tradeoff? Should we ever choose breadth? Which of the techniques in point #1 do you think are most important?

The secret recipe for successful working group meetings?

As Meg noted recently, science is increasingly involving working groups. This is the big science aspect I discussed a while back in a lengthy footnote (and distinct from big data). Although the original (at least in ecology) synthesis center at NCEAS is no longer funded by NSF (but still very much alive funded by conservation NGOs), there are three other synthesis centers in the US (NESCent, NIMBios, SESynC). a somewhat differently functioning synthesis center iPlant, and centers in Canada, Australia, France, Germany and many other countries (http://synthesis-consortium.org/). And I’m increasingly seeing work done in “working group format” even when it is not tied to a center. The NSF RCN  (Research Coordination Network grant program) is an example but quite a few PIs on regular NSF grants or NGO/conservation grants are also choosing the working group format.

I am a self confessed working group junkie. I take (undue?) pride in the fact that I’ve been to groups at all five US centers (and led working groups at two of them), been part of an RCN, been to meetings at Germany’s sDiv and although not an official synthesis center part of the UK’s Kavli meetings, and will be at Canada’s CIEE in May  and if funded at CESAB in France soon. That leaves Australia as the only big miss on my list (at least for an ecologist), and I did participate in an NGO-funded working group remotely in Australia as well.

Working groups are a personal preference. Some people like them more than others. And some people are better at being part of them than others too! There is no best way to do science. But I think they’re a great format for doing a number of things including – addressing both sides of a debate and building consensus, reviewing a field, doing meta-analysis or assembling and analyzing large datasets, and coalescing ideas and energy at key points in the trajectory of a field (including at its launch and at its coming down from bandwagon status). Certainly they have been influential – NCEAS is one of the most cited institutions in ecology.

But working groups are not a format people are trained to work in, let a lone lead. Our whole PhD is focused on primarily solo work with a few interactions. Most “regular” papers are 1-5 people. Then we throw people into a group with 15-20 people and social dynamics that are an order of magnitude more complex with no training. What follows is my distillation of the key success factors of working groups. They do not unfortunately, despite the title, come together into a magic recipe that guarantees success. And there are of course some variation depending on goals. But in my experience, if you get all of the following ingredients you’ve got a good shot at success.

During the working group proposal process

  1. Group composition #1 – personalities matter – Working groups are first and foremost social enterprises (I will be repeating this sentence several times). And with the competing challenges on everyone’s time and only having a week to pull things together, you are on the edge of failure right from the start. So it may be tempting to get the biggest name in the field, but if they’re a colossal ego who doesn’t play well with others avoid the temptation. One bad apple really can spoil the barrel. Indeed only invite people that you know either personally or indirectly through a colleague to be a good collaborator. Twice I’ve been part of groups where the goal was explicitly to bring in people from opposing camps – but even here considerable effort was expended to only bring people in who could be part of a civil give-and-take dialogue and some of the extremists were intentionally left out..
  2. Group composition #2 – career stages – In my experience the ideal working groups has  a pyramid shape with the largest group being postdocs, the next largest group being early tenure track, and a much smaller sample of senior ecologists. I’ve never actually seen a truly pyramidal group, maybe a more realistic goal is rectangular – with equal representation of postdocs, early career, and senior. But definitely think about this.
  3. Meet for 5 days per session – There is a wide variety of opinion on this. And I’ve been part of 2 day meetings that are successful. But if you’re going to fly in people form around the world who are giving up 2-3 days to travel and jet lag, why would you meet for less than 4-5 days? Also in my experience it really does take that long to allow some of the and social processes and buy-in to a common goal to take place. It may be efficient to have small subset groups that meet shorter periods (or extensions to the 5 days).  And if everybody already knows each other so the social processes and goals are well worked out, sometimes fewer days works. But in most cases 5 days is an optimal number in my experience. And if people can’t commit the 5 days, they’re not going to be a big contributor anyway. The working group process is a slow one. There are many other advantages, but speed is not one.
  4. Who will do the work between meetings? – This is one of the motivations for #2 – everybody will leave a group meeting with good intentions. But who will actually spend more than 5 hours moving the project forward (i.e. doing data, simulations, analysis, writings)? If the PIs of the working group aren’t going to do this (and if they aren’t prepared to do this they probably shouldn’t be the PIs) and there aren’t any postdocs looking for good projects then odds are nobody will do this. There are some exceptions I’ve seen, where say the goal was a meta-analysis and during the meeting everybody was assigned say 10 papers to code before the next meeting. This very discrete chunk can be expected between meetings. And I’ve seen plenty of meetings where somebody uplanned stepped up to carry a load (but they were almost always postdocs or occasionally early career).

Beginning of meeting

  1. Do a Powerpoint death march on the first day – This is my tongue-in-cheek name for the idea of lettting everybody at the group stand up and give a presentation about their work related to the topic. This is oft-debated with many arguing it is a waste of time. But in my experience if you don’t give people a clear window to get their opinion out, they will spend the whole rest of the meeting slipping it in edgewise. I have seen this happen more than once and it can be really annoying when the whole group is converging and somebody is knocking on about their preconceived idea of how to do it – better to get it out of the way on the first day. It is in the long run more efficient to spend a day doing this. That said, the PIs can make this efficient or painful. Give people very clear instructions on what you wan them to present on. And give them absolute time limits (typically 10 or 15 minutes). Then ENFORCE the time limits rigidly. Conversation about a presentation is fine to run over a little since conversation is the point of a working group. But DO NOT let anybody deliver a monologue one minute over their planned time. This only needs to be done the first time a group meets.
  2. Do a regrouping and group agenda setting after the Powerpoint death march – After everybody has been heard from spending some time setting the agenda for the rest of the time. Many times the PIs will have a pretty clear idea. Other times, the goal really is to brainstorm the agenda together. But either way put it on a white board and talk it out a bit as a group and be open to changes. This will get you buy-in and understanding of the agenda. It will also get you the sum-is-greater-than-the-parts synergy that you are hoping for from a working group.
  3. PIs need to take their role as cruise director seriously – Working groups are first and foremost social enterprises (I promised you that idea would come back). I’ve never seen a successful working group that didn’t spend a lot of time going out to dinners. The PIs need to take the lead to make sure that these are organized by early afternoon so everybody knows and they need to set the example that this is an expected activity. There is an age old debate amongst group members who want to go to dinner right after work stops and those who want a couple of hours to go exercise first. Some compromise is needed. Some of the best working groups I’ve been part of have actually knocked off early one afternoon and gone for a hike or field trip. It might seem a waste of time, but trust me it pays off
  4. Lead a discussion about authorship expectations early – There is no right or wrong answer about who should be a co-author on papers from the group. But setting expectations in a group discussion up front is essential. Most groups I’ve been part of have decided that everybody present should be part of the core synthesis or review paper(s). You want to create an attitude where everybody is chipping in and not holding back their best ideas. Authorship is the best way to do this. Authorship rules on more subisidiary papers varies, but it should be collectively agreed up front.

Middle part of the meeting (e.g. days 2-4)

  1. Do the work – this is of course the end goal. But its the hardest to give generic advice about because the nature of the work varies. It may be finding and coding papers for a meta-analysis or assembling data sets. It might be a fairly large group discussion about consensus state of the field. It might be simulations. It might be a mixture of these things. But it probably occupies the bulk of the meeting – especially the middle days. And it probably involves breaking out into subgroups with different tasks or topics to cover.
  2. Regroup once or twice a day – even if much of the work will happen in breakout groups (and it almost certainly will) – bring the whole group back for 30 minutes before lunch and 30 minutes before dinner and have each group report in. This keeps everybody rowing in the same direction. It is also where much of the magic of working groups happens as recurring themes and areas of disagreement emerge.
  3. Follow a diamond-trajectory – This is true really of any brainstorming or group process. The goal in the beginning is to broaden out – open up minds, create crazy ideas, capture every thought. Then when things have gotten impossibly wide, it is time to pivot and turn energies into focusing and narrowing down. A key to a good working group is for the PIs to have the nerve to let things broaden out for a while (often several days)  and then have the leadership to firmly reign it back into a focus.
  4. Know when to force a turning of the corner to writing – closely related to #11. In no case should you start writing immediately. And one or two people will probably do the bulk of the writing probably after you go home. But you should definitely start writing (or at least detailed outlining) before you scatter. You might even assign sections and end up writing a whole draft while you’re at the working group. But this is another key decision point for the leaders – when to stop the talking/analyzing and start the writing. It should start (again at a minimum to outline stage) before you leave.
  5. Pace yourself – it is tempting to treat the working group time as so precious that you should work 12 hour days. But this is a big mistake. Aside from the great importance of social bonding (#7), you are doing a creative activity that requires fresh bright minds. Many of your members will have flown 12-24 hours to get there and be jet lagged. And the rest will be exhuasted by an intense pace long before the week is over. I’ve personally found that keeping the working group to 9-5 with at least an hour for lunch (plus joint dinners that are social) keeps things productive through day 5 while anything more leads to severe drooping by the end.
  6. Manage the email and phone calls – everybody will want/need to keep up on email and may make an occasional phone call to their lab managers, other collaborations, etc. In my experience the best way is to tackle this head on by building in time for it and then putting out a pretty clear expectation to be fully focused on the meeting the rest of time. I usually allow 60 minutes for lunch (this is a social enterprise …) and then a good 30-45 minutes immediately after lunch for phone calls and catching up on email. This way people can run a little long on lunch or end a little early and have more time for email as they wish. And you can expect (and demand) full attention the rest of the time.

End of the meeting (e.g. Day 5)

  1. When the meeting really ends – If you tell people the meeting ends at noon, they will book flights out at 9. If you tell people the meeting ends at 5, they will book flights out at 12 or 1. So tell them it ends at 5 and secretly (don’t let on your real plan) know that you really will end at 1:00PM. But don’t forget that long distance travellers will usually not fly out until the next day. You can still get some work done, and have one last dinner. You just won’t have everybody. As a PI you should definitely plan to stay until the day after the meeting is officially over and lead this tail end.
  2. Leave with clear assignments  – well before people start peeling out – i.e the morning of the last day – put a list on the projector or white board of tasks, deadlines and 1-2 names attached (5 names attached is the same as no names attached). Discuss this with the whole group.
  3. Accountability – Find a way to keep work flowing between meetings. Emails with reminders of tasks is a good way to do this. Circulating draft versions of papers or working versions of datasets is a good way too. In my experience scheduling a monthly phone call is also a good idea. Having somebody setup to be a “nagger” (either a PI or a postdoc) who keeps track of timelines is important too.

So – being a good leader of a working group just requires staying on top of 17 different things! If it sounds like leading a working group is exhausting – it is! Being a participant at a working group is exhausting, but being a leader and riding herd on the whole process is a whole other level of exhausting.

Obviously my 17 points are not a magic formula. Its just the wisdom I’ve pieced together over a couple of dozen working group meetings. And a couple like #11 and #12 require serious judgement on the PIs part – all I can do is highlight the question. And some will disagree with my list – I know from discussions I’ve had #3 and #5 are definitely not universally agreed upon.

What are your experiences? What are the ingredients in your secret recipe to a successful working group? What works and doesn’t work?

In praise of slow science

Its a rush rush world out there. We expect to be able to talk (or text) anybody anytime anywhere. When we order something from half a continent away we expect it on our doorstep in a day or two. We’re even walking faster than we used to.

Science is no exception. The number of papers being published is still growing exponentially  at a rate of over 5% per year (i.e. doubling every 10 years or so). Statistics on growth in number of scientists are harder to come by – the last good analysis I can find is a book by Derek de Solla Price in 1963 (summarized here) – but it appears the doubling time of scientists, while also fast, is a bit longer than for the doubling time of the number of papers. This means the individual rate of publication (papers/year) is going up. Students these days are being pressured to have papers out as early as their second year*. Before anxiety sets in, it should be noted that very few students meet this expectation and it is probably more of a tactic to ensure publications are coming out in year 4 or so. But even that is a speed up from publishing a thesis in year 6 or so and then whipping them into shape for publication which seemed to be the norm when I was in grad school. I’ve already talked about the growing number of grant submissions.

Some of this is modern life. Some of this a fact of life of being in a competitive field (and there are almost no well paying, intellectually stimulating jobs that aren’t highly competitive).

But I fear we’re losing something. My best science has often been torturous with seemingly as many steps back as forward. My first take on what my results mean are often wrong and much less profound than my 3rd or 4th iteration. The first listed hypothesis of my NSF postdoc proposal turned out to be false (tested in 2003-2004). I think I’ve finally figured out what is going on 10 years later. My first two papers did not come out until the last year of my PhD (thankfully I did not have an adviser who believed in hurry up science). But both of them had been churning around for several years. In both cases I felt like my understanding and my message greatly improved with the extra time. The first of these evolved from a quick and dirty test of neutral theory to some very heavy thinking about what it means to do models and test theory in ecology. This caused the second paper (co-authored with Cathy Collins) to evolve from a single prediction to a many prediction paper. It also lead to a paper in its own right. And influenced my thinking to this day. And in a slightly different vein since it was an opinion paper, my most highly cited paper was the result of more than 6 months of intense (polite but literally 100s of emails) back and forth debate among the four authors that I have no doubt resulted in a much better paper.

I don’t think I’m alone in appreciating slow science. There is even a “slow science” manifesto although it doesn’t seem to have taken off. I won’t share the stories of colleagues without permission, but I have heard plenty of stories of a result that took 2-3 years to make sense of. And I’ve always admired the people who took that time and in my opinion they’ve almost always gotten much more important papers out of it. I don’t think its a coincidence that Ecological Monographs is cited more frequently than Ecology – the Ecological Monographs are often magnum opus type studies that come together over years. Darwin spent 20 years polishing and refining On the Origin of Species. Likewise, Newton developed and refined the ideas and presentation behind Principia for over a decade after the core insight came.

Hubbell’s highly influential neutral theory was first broached in 1986 but he then worked on the details in private for a decade and a half before publishing his 2001 book. Would his book have had such high impact if he hadn’t ruminated, explored, followed dead ends, followed unexpected avenues that panned out, combined math with data and literature and ecological intuition and generally done a thorough job? I highly doubt it.

I want to be clear that this argument for “slow science” is not a cover for procrastination nor the fear of writing or the fear of releasing one’s ideas into print (although I confess the latter influenced some of the delay in one of my first papers and probably had a role with Darwin too). Publication IS the sine qua non of scientific communication – its just a question of when something is ready to write-up. There are plenty (a majority) of times I collect data and run an analysis and I’m done. Its obvious what it means. Time to write it up! So not all science is or should be slow science. Nor is this really the same as the fact that sometimes challenges and delays happen along the way in executing the data collection (as Meg talked about yesterday).

But there are those other times, after the data is already collected, where there is this nagging sense that I’m on to something big but haven’t figured it out yet. Usually this is because I’ve gotten an unexpected result and there is an intuition that its not just noise or a bad experiment or a bad idea but a deeper signal of something important. Often there is a pattern in the data – just not what I expected. In the case of the aforementioned paper I’ve been working on for a decade, I got a negative correlation when I (and everybody else) expected a positive correlation (and the negative correlation was very consistent and indubitably statistically and biologically different from zero). Those are the times to slow down. And the goal is not procrastination nor fear. It is a recognition that truly big ideas are creative, and creative processes don’t run on schedules. They’re the classic examples of solutions that pop into your head while you’re taking a walk not even thinking about the problem. They’re also the answers that come when you try your 34th different analysis of the data. These can’t be scheduled. And these require slow science.

Of course one has to be career-conscious even when practicing slow science. My main recipe for that is to have lots of projects in the pipeline. When something needs slowing down, then you can put it on the back burner and spend time on something else. That way you’re still productive. You’re actually more productive because while you’re working on that simpler paper, your subconscious mind is turning away on the complicated slow one too.

What is your experience? Do you have a slow science story? Do you feel it took your work from average to great? Is there still room for slow science in this rush-rush world? or is this just a cop-out from publishing?


*I’m talking about the PhD schedule here. Obviously the Masters is a different schedule but the same general principle applies.

How many terms should you have in your model before it becomes statistical machismo?

Before the holidays, I ran a poll asking why people’s models have gotten bigger and more complex (i.e. more terms in regressions). First it is worth noting that essentially nobody disagreed with me that models have gotten more complex.So  I am taking it as a given that my original characterization that the typical model has increased from a 2-way ANOVA (maybe with an interaction term) to say 4-8 terms (several of which may be interaction terms or random factors) just in the last 15 years.

Like every topic I place under the statistical machismo header, there is no one answer. No right or wrong. Rather it is a question of trade-offs where I hope to make people pause and question conventional wisdom which seems to always and only lead to ever increasing complexity. Here I definitely hope to make people pause and think about why they are building models with 5-8 terms. (NB the following is a typically long-winded blog post for me, feel free to skip to the bold summary paragraph at the bottom).

In econometrics this issue is taught under the title “ommitted variable bias” (and it is frequently taught in econometrics and often in psychology). One can mathematically prove that if you leave a variable out which is correlated with the variables you include this will lead to a bias in your estimation of the slopes for the variables you did include. The trade-off is that including more variables in a regression leads to a loss of efficiency (bigger error bars around your slope estimates). This seems then to boil down to a classic bias vs variance trade-off. I’m personally not too sold on this view point for this particular problem. First the mathematical proof has a very unrealistic assumption that there is a single definitive set of variables which alone cause the dependent variable – but this is never the real world. Second, although it might introduce bias, there is no way to know whether it biases slopes positively or negatively which means in practice you don’t know how its biased which in a weird meta way goes back to being effectively unbiased. The whole omitted variable bias is pretty decisively shredded in Clarke 2005.

Most ecologists I think are instead coming pretty much from Hurlbert’s extremely influential paper on pseudoreplication (which got a lot of confirmation in the survey). Hurlbert introduced the idea of pseudoreplication as a problem and made two generations of ecologists live in fear of being accused of pseudoreplication. However, nobody seems to recall that adding more terms to a regression is NOT one of the solutions Hurlbert suggested! And I’m willing to bet it would not be his suggested solution even with modern regression tools so easily available. His primary argument is for better more thoughtful experimental design! There is no easy post hoc fix through statistics for bad experimental design and I sometimes think we statisticians are guilty of selling our wares by this alluring but flawed idea. Beyond careful experimental design, Hurlbert, basically points out there are two main issues with pseudo-replication: confoundment and bad degrees of freedom/p-values. Let me address each of these issues in the context of adding variables to complexify a regression to solve pseudo-replication.

1) Confoundment – Hurlbert raises the possibility that if you have only a few sites you can accidentally get some unmeasured factor that varies across your sites leading you to mistakenly think the factor you manipulated was causing things when in fact its the unmeasured factor that is confounded with your sites by chance. However, and this is a really important point – Hurlbert’s solution (and anybody who thinks for five minutes about experimental design) is to make sure your treatment is applied within sites, not just across sites, thereby breaking confoundment. Hurlbert also goes into much more detail about relative advantages of random vs. intentional interspersion of treatments and etc. But the key point is confoundment is fixed through experimental design. This is harder to deal with in observational data (one of the main reasons people extol experiments as the optimal mode of inference). But in the social sciences and medicine it is very common to deal with confoundment in observational data by measuring and building in known confounding factors. Thus nearly every study controls for factors like age, race, income, education, weight, etc by including them in the regression. For example propensity to smoke is not independent of age, gender or income which in turn are not independent of health, so decisive tests of the health effects of smoking need to “remove” these co-factors (by including them in the regression). Either Hulrbert’s experimental design or social science’s inclusion of co-factors make sense to me. But in ecology , we instead tend to throw in so-called nuisance factors like site (and plot within site) and year but this does NOT fix confoundment (and is more motivated by non-independence of errors discussed below). To me confoundment is NOT a reason for the kinds of more complex models we are seeing in ecology. If you are doing an experiment, then control confoundment in the experimental design. And if it is observational include more direct causal factors (the analogs of age and demographics) like temperature, soil moisture, vegetation height and etc instead of site and year nuisance factors if you are worried about the confoundment problem of pseudoreplication.

2) Bad degrees of freedom/p-values – Hurlbert’s second concern with pseudo-replication (which is totally unrelated to confoundment and is not fixable by experimental design) relates to  p-values. This is because non-independence of error terms violates assumptions and essentially leads us to think we have more degrees of freedom than we really have, which since we divide by degrees of freedom to get p-values leads us to think our p-values are lower than than they really are (i.e. p-values are wrong in the bad way – technically known as anti-conservative). This is a mathematically true statement so the debate comes in with how worried we should be about inflated p-values..If we decide are worried we can just stop using p-values (recall this was Hurlbert’s recommendation but very few remember that part of the paper!). Nor does Hurlbert imply there are larger problems than the p-value inflation (and the confoundment raised above). In fact Hurlbert says psuedoreplication without p-values can be a rational way forward.

The question of whether to report p-values or not interacts in an interesting way with one of the main results of the survey. Many people feel like having more complex models is justified because they are switching to model selection approaches (i.e. mostly AIC). This approach is advocated by two of the best and deservedly most popular ecology stats books (Crawley & Zuur et al). But I have to confess that I am uncomfortable with this approach for several reasons. First, the whole point of model selection initially (e.g. Burnham and Anderson’s book) was to move away from lame models like the null hypothesis and compete really strong models against each other as Platt (1964 Science) recommended in his ode to strong inference. Comparing a model with and without a 5th explanatory factor does not feel like comparing strongly competing models so it does not feel like strong inference to me. Second, model selection is a great fit for prediction because it finds the most predictive model with some penalty for complexity (recall in the world of normally distributed errors AIC is basically the SSE minus 2* the # of parameters and SSE is also the numerator in R2 making a precise mathematical link between AIC and R2). But model selection is a really bad fit for hypothesis testing and p-values (again as anybody who has read the Burnham and Anderson book will have seen but few follow this advice). Although I don’t go as far as Jeremy and Andrew Gelman (I think doing one or two very simple pre-planned comparisons such as with or without interaction term and then reporting a p-value is probably OK), I strongly believe that one should not do extensive model selection and then present it as a hypothesis test. While I agree with Oksanen’s great take-down of the pseudoreplication paper that argues p-values are only courtesy tools, I don’t think most people using model selection and then reporting p-values treat them that way. I’m fine – more than fine – with pure exploratory approaches, but I think a lot of people are noodling around with really complex models and lots of model selection and then reporting p-values like they’re valid hypothesis tests. Indeed, I have had reviewers insist I do this on papers. This strikes me as trying to have your cake and eat it too and I think is one of the reasons I am so uncomfortable with the increasingly complex models – because they are so highly intertwined with model selection approaches.

I do think it is important to note that whatever the motive, there are genuine costs to building really complex models. The biggest cost is the loss of interpretability. We know exactly what a model with one explanatory factor is saying. We have a pretty good idea what a model with two factors is saying. And I even have a really precise idea what a 2-way ANOVA with an interaction term is referring to (the interaction is the non-additivity). But I have yet to see ever a really convincing interpretation of a model with 5 factors (other than “these are things that are important in controlling the dependent variable” at which point you should be doing exploratory statistics). And interaction terms (often more than one these days!) are barely interpretable in the best circumstances like when the hypothesis is explicitly about interaction. And while mixed models with random effects are a great advance, I don’t see too many people interpreting random effects in any meaningful way (e.g. variance partitioning) but the most commonly used mixed model tool – lmer – pretty much guarantees you don’t know what your p-values are (for good reasons) and the most common workarounds are wrong and often anti-conservative to such a degree that the author’s of the package refuse to provide p-values (e.g. this comment and Bolker’s comments on Wald tests). Again – if you want to do exploratory statistics, go to town and include 20 variables. But if you’re trying to interpret things in a specific context of particular factor X has an important effect, you’re making your life harder with more variables.

Another big problem with throwing lots of terms in is collinearity – the more terms you have, the more likely you are getting some highly correlated explanatory variables. And when you have highly correlated explanatory variables, the “bouncing beta problem” means you are basically losing control of the regression (i.e. depending on arbitrary properties of your particular data, the computer algorithm can assign almost all of the explanatory power – i.e slope – to either one or the other correlated variable – or in other words – if you drop even one data point the answer can completely change).

So, in summary, adding variables is a very weak substitute for good up front experimental design. It might be justified when the added variables are known to be important and are used to control for confoundment with sampling problems in an observational context. But that’s about it. And the techniques often invoked to make complex models viable such as random effects and model comparison pretty much guarantee your p-values are invalid. I find it very ironic so many people go to great lengths including nuisance terms to avoid pseudoreplication (to ensure their p-values are valid) then guarantee their p-values are invalid by using random effects and model selection. And good luck interpreting your complex model especially when coefficients are being assigned to collinear variables arbitrarily! So to my mind complex regression models straddle the fence very uncomfortably between clean hypothesis testing contexts (X causes Y and I hypothesized it in advance) and pure exploratory approaches – this fence sitting complex model approach to my mind has the worst of both worlds, not the best of both worlds.

To put it in blunt terms, it would appear from popular answers in the survey that many people are complexifying their models in response to Hurlbert’s issues of pseudo-replication and Burnham & Anderson’s call for model comparison but seem to forget that both of them actually call for abandoning p-values to solve these problems. And that Hurlbert’s paper was really a call for better experimental design and Burnham & Anderson’s book was a call for a return to strong inference by competing strong models against each other not tweaks on regressions. So these were both calls for clear, rigorous thinking before starting the experiment, NOT for post hoc fixes by adding terms to regression models.

So, I have to at least ask, how much of this proclivity for ever more complex models is a result of peer pressure, fear of reviewers and statistical machismo? I was a little surprised to see that no small fraction of the poll respondents acknowledged these factors directly.So I urge you to think about why you are complexifying your model. Is it an observational study (or weakly controlled experimental study) where you need to control for known major factors? Should you really switch to an exploratory framework? Are you willing to give up p-values and the hypothesis testing framing? if not, say no to statistical machismo and keep your model simple!

#ESA100 – big concepts and ideas in ecology for the last 100 years

ESA (Ecological Society of America) is celebrating its 100th anniversary in 2015. This will culminate in the 100th annual conference in Baltimore in August 2015. As part of the buildup, ESA has asked various people to discuss today (Dec 3) via social media “big ideas or discoveries that have had the greatest impact on ecological science over the last century”. So I’m sharing my thoughts today. Meg & Jeremy will add their own over the next few weeks. Check out Twitter hashtag #ESA100 as well.  (And the Brits reading this don’t have to remind us that 2013 was the 100th anniversary of the British Ecological Society and they got there first as I was also at and enjoyed their 100th conference).

A couple months back I took a stab using Wordles at how ecology has changed in 25 years. For this longer time frame of 100 years, I’m not going to pass the buck to technology and going straight to my own (lengthy!) opinions. I am going to divide this into three sections – core ideas that spanned most or all of 1915-2014, ideas that emerged over the latter half of that 100 year period that are dominate now, and ideas that I predict will dominate the next 100 years. I will also divide each section into tools/methods and ecological concepts.

Tools/methods 1915-2014

  • Differential equations as models of population abundance – without a doubt this has been one of the most dominant ideas for the last 100 years. It started as a way of modelling dynamical chemical reactions and then moved into ecology in the 1920s in the work of Lotka and Volterra (although Verhulst presaged this work with the logistic equation of human population growth in the 1830s rediscovered by Pearl in 1920). By the 1930s full treatises applied to competition, predation and mutualism appeared such as Lotka’s excellent 1925 book Elements of Mathematical Biology (worth reading still today!), Gause’s 1934 book Struggle for Existence and the review paper by Gause and Witt 1935 (Am Nat). If you look at any modern theoretical ecology book (or any undergraduate ecology text book) you will see these core ideas explained in great detail and then elaborated on with age structure, stochasticity, time lags etc. In the 1970s there was a movement led by Robert MacArthur and EO Wilson to define this use of differential equations focused on populations as the sine qua non of good ecology with profound effects (the highly influential population biology graduate program at UC Davis as one example). I would argue population-level differential equations has been THE dominant tool for the last 100 years. I am personally a little ambivalent about this. While I think quantification and math are important in science, I don’t think populations are the only important scale to study (and its not obvious what one variable would be at the center of differential equations at other scales), we have only been able to capture parameters for rates of change of populations in highly phenomenological (almost circular) fashion, and the differential equation approach has led to an overemphasis on equilibria (something easy to solve for in differential equations but not so obviously a prominent feature of nature)

Concepts 1915-2014

  • Succession – succession of plants in the Indiana sand dunes was the 1898 thesis topic of Henry Cowles, founder of the ESA. Frederic Clements also worked on this in the early decades of the 20th Succession has been at the center of one of ecology’s great, ongoing defining debates: individualistic responses vs. species interactions and community structure (Gleason vs Clements). Succession played a central role in Odum and Whittaker’s undergrad textbooks in the 1970s and you can still find a full chapter devoted to succession in every popular textbook today. At the same time succession has become passé in the last 30 years (e.g. the 2009 Princeton Guide to Ecology has almost 100 entries but not one on succession). Deserved death or a pendulum swing that will come back? (Can I say both of the above?)
  • Competitive Exclusion, Limiting Similarity, Niche overlap – and etc – The fact that one of four outcomes of the Lotka-Volterra differential equation model of competition leads to competitive exclusion followed in short order by Gause’s 1934 experimental confirmations in a microcosm has led to a central role for competitive exclusion and related ideas like limiting similarity, niche overlap, body size ratios, closely related species not co-occurring in communities in phylogenetic community ecology, and etc. If I were to pick one concept that dominated ecology from the 1930s to the present day, this would be it. Indeed, I would probably go further and say it crossed the line to become an obsession. Competition incorrectly received primacy over predation, disease and mutualism. And the blindingly obvious fact that species coexist outside our window even if they don’t in homogenous bottle systems has not prevented an over focus on how two species coexist instead of the more important question of what controls whether it is 2, 5, 20, or 200 species coexisting.
  • Food webs – the food web idea loosely goes back to Stephen Forbes, arguably the first ecologist in America with his 1887 essay on “The lake as a microcosm”. Food webs sensu strictu as a graph of who eats who have run as a key idea through the work of Charles Elton, Robert Paine’s starfish removals and keystone species, Joel Cohen, Stephen Carpenter’s trophic cascades, work on alternative stable states and right into the present day with efforts to model the population dynamics in a food web context using differential equations. Network theory is hot these days and a clear extension of food webs. Food webs sensu latu have also served as a metaphor for the idea both in research ecology and the environmental movement that everything is connected to everything. We love these stories – remove one little insect and watch the whole ecosystem collapse.
  • Ecophysiology culminating in mechanisms driving biomes – naturalists all the way back to von Humboldt and Darwin, some of the early German founders of ecology (Warming, Schimper) and running through Robert H Whittaker and his 1975 book have noted that there is very systematic variation in vegetation structure and type across the globe with climate (tall trees in wet tropics, savannas in dry tropics, thorn scrub in deserts, grasslands in dryish summer wet places, Mediterranean in dryish winter wet places, etc). This topic remains active into the present day with attempts to include realistic models of vegetation in global circulation models and carbon models. I am hard pressed to pinpoint a single turning point (although Gates 1965 book Biophysical Ecology is a good stab) , but we have gradually worked out the core physiological principals driving this (water balance, heat balance, photosynthesis controls, etc) and some of the biggest names in ecology (Hal Mooney, Stuart Chapin, Christian Korner, Monteith, Graham Farquhar, Ian Woodward) have worked in this area. The animal people have not been quite as successful in prediction of distribution and abundance, probably because there is not as much variation in growth forms as in plants, but great progress has still been made, especially in lizards and/or thermal ecology, by the likes of Ray Huey, Warren Porter, Bruce McNab etc. Whether you call this field physiological ecology, functional ecology, biophysics or something else it is one of the few areas of ecology to become predictive from first principles.
  • Importance of Body Size – if you could only know two simple facts about an organism, probably the two things you would want to know is which taxonomic class (bird, mammal, angiosperm, fern) and body size. Body size makes good predictions about who will eat who, degree of thermal stress, etc but also, in a relationship that is unusual precise in ecology, metabolic rate and a whole host of things that are connected including calorie requirements, growth rates, life span, age of maturity, intrinsic rate of population increase, dispersal distance, etc. The central role of body size has been understood at least since 1932 (Kleiber in a German publication and a 1947 paper “Body size and metabolic rate”in English). Two 1980s books (Peters 1983 The ecological implications of body size and Calder’s 1984 Size Function and Life History) showed just how central body size is. This work received significant recent attention through some of the most highly cited papers in ecology by Jim Brown, Brian Enquist and Geoff West among others. Although the potential of this discovery to inform about poorly understand species of conservation is in my opinion still underappreciated there have been some very clever applications including a paper by Pereira and Gordon 2004 in Ecological Applications, a fun one on dinosaurs by Farlow 1976 in Ecology and John Lawton’s 1996 calculation of the population dynamics of the Loch Ness monster in Oikos. Mechanisms are still hotly debated but the sheer statistical predictive power of body size is rare in ecology.

Tools 1950s to 2050s

This section and the next contain ideas on tools and concepts that got their start in the latter half of the 20th century and are arguably still in their infancy today but with much work going on.

  • Stable Isotopes – I’ve never personally used this technique, but the ability to quantify the ratio of different isotopes of an element (say oxygen 16 and 18) in small samples has revolutionized what we can measure. We can measure how old things are (when they died) (carbon), where they came from (strontium), how hot or cold it was when tissue was laid down (hydrogen among others), where/when water used by a plant came from (again hydrogen among others), how high up the trophic chain a species eats (nitrogen), and on and on. I’m sure we’re nowhere near the end of novel measurements that can be done with stable isotopes.
  • Phylogenetics – starting with Willi Hennig’s 1950 book on cladistics and Felsenstein’s early 1970s and 1980s papers and software on methodology followed by many others, the ability to unravel the precise evolutionary history of species has changed not just evolution but ecology. Like any such tool, it has created some bandwagons, but there is no denying it has changed our ability to ask meaningful ecological questions in a macroevolutionary context (how fast do different traits evolve, is higher species richness in the tropics due to speciation or extinction, how do coevolving clades speciate, and etc).

Concepts1950s to 2050s

  • Space– there has been exponential growth in the study of the role of space in structuring populations and communities. Arguably this started in the 1950s with Andrewartha’s 1954 ecology textbook that had what we would now call a metapopulation on the cover and Skellum’s 1951 diffusion equation models. This was followed by Levins 1969 paper on metapopulations, Hanski’s development and popularization of metapopulations through the 1980s and 1990s, MacArthur and Wilson’s 1969 island biogeography, Simon Levin’s 1970s work showing space can be a coexistence mechanism, Monica Turner and many other’s launch of landscape ecology as a subsdiscipline, the increasing interest in the role of regional pools in structuring local communities (accelerated by Hubbel’s 2001 neutral theory), and the growing interest in dispersal ecology and the role of dispersal limitation. The recent recognition of the importance of scales is closely tied to finally putting our understanding into a spatial context. I don’t think we’re done with space yet.
  • Evolutionary Ecology/Optimality – Hutchinson wrote a textbook in 1965 on The ecological theater and the evolutionary play, arguing for the need for stronger links between the two fields (or more precisely observing they exist whether we ignore them or not). Judging by the proliferation of journals in the field of evolutionary ecology I think he was heard! The backlash against Wynne-Edwards 1962 book (Animal dispersal in relation to social behavior) containing group selection arguments certainly focused our collective minds as well. A great deal of individual behavior and life history as well as sociality are now evaluated through the lens of evolution. So are species interactions (i.e. coevolution). And although it is only a short cut, optimality with constraints is a very useful short cut to understand the evolutionary outcome of behavior ranging from foraging to habitat selection to various forms of game theory.
  • Mutualisms and Facillitation – competition and predation were dominant ideas for the last 100 years (and remain dominant ideas). But it seems mutualism didn’t get much love until recently. While the existence of mutualism was well understood 100 years ago (and the aforementioned Gause and Witt paper gave a model of mutualism population dynamics in 1935), understanding mutualism as a fundamental structuring force of communities came much more recently. The growth of tropical ecology certainly fed an interest in mutualism as has the increasing study of pollination as an ecosystem service and the idea of facilitation (a gradient from competition to mutualism depending on the harshness of environmental conditions). The role of the microbiome mutualism is likely to be part of Meg’s answer to great conceptual advances.
  • Macroecology – I may be biased on this one … but I think the snapping out of amnesia induced by the population only approach to return to our roots and look at some of the oldest questions in ecology like the controls of species richness, the controls of abundance, species ranges, distribution of body sizes etc has been a very good thing. Not in a replacement sense (of e.g. population biology), but in an addition sense of we have to tackle these questions now and not work up to them in 100 years when population ecology is all figured out. And I think it has happened just in time with the looming challenge of global change. It is interesting that the arc of the careers of many of the most famous community ecologists (Rosenzweig, Brown, Mittelbach, Ricklefs, Lawton, May and, ahead of his time, MacArthur) all included a turn toward macroecology. And macroecology seems to be at a magic scale such that it has produced many of the most law-like, universal principles in ecology (abundant species are rare, big-bodied animals are rare, species area relationships, decay of similarity with distance, etc). I could go on for many pages on this topic alone, but I’ll stop here for now!

Tools 21st century

  • Remote sensing – Using digital images taken from elevation so as to cover large areas (usually from airplanes or satellites, but increasingly towers too) has been slowly creeping into ecology for decades. At the moment, remote sensing is more informative about the environment (e.g. topography) and the ecosystem aspects (e.g. NDVI as a proxy for productivity or greenup). And these will be continue to be growth areas. I am part of a project supplementing ground weather stations with satellite measurements of weather to fill in the gaps on the ground, and I suspect it won’t be too long until we can dispense with ground measurements entirely for coarse scale measurements of climate at remote sites. And Greg Asner’s work among others on using hyperspectral (100s of channels or frequencies instead of the usual 3 or 4 – think very fine gradations of color) allowing detecting of nitrogen levels in leafs and etc is impressive. But I also think we’re within a decade or two of remote sensing being able to identify individuals to species in some settings like canopy trees or ungulate herds. And that will open up whole new spatial scales to abundance questions.
  • Biodiversity informatics – Linnaeus became famous for being the first to formally catalog biodiversity. We have had museums and collectors working at this goal ever since. The rapid movement of this data into online databases is opening up whole new vistas. These include generating species ranges for entire classes of organisms (e.g. all vertebrates by NatureServe and others and soon 100,000+ plants of the New World by BIEN). And changes in species ranges, phylogeny and traits like morphometrics over the last 100 years or so are being evaluated by using dates on collection records. And even at the most basic level, having online, real-time updatable standardized taxonomies are a great boon to those studying poorly known systems. We’re also finally starting to get a handle on some surprising basic trends in biodiversity metrics that make us realize how little we really know about biodiversity trends in response to the Anthropocene.

Concepts 21st century

  • Species Richness – Will the 21st century finally answer one of the greatest questions in ecology first raised in the 19th century – why are there more species in the tropics? And more generally will we get traction on the question of what factors control species richness at different scales and along different gradients? I am optimistic – I just hope it happens in my lifetime.
  • Species Ranges – the species range is one of the most fundamental properties of a species, and there is a pressing need for prediction of how ranges will respond to climate change, habitat destruction, etc. Yet we have mostly just danced around the edges of this problem and really only have anecdotes for specific species and specific range edges plus a giant cottage industry of predicting range shifts using correlative niche models that aren’t that well validated. We’ve got to do better on this one!
  • A predictive theory of the response to global change – I’ve harped on the need for ecology to become more predictive, and I personally think the biggest intellectual challenge and the biggest test of whether ecology has any value for society is being able to predict how the biosphere will respond to human-caused global change (i.e. the Anthropocene).

What I left out

I have left a number of obvious choices out. Some of these are statements of my ignorance. For example, I don’t know enough about ecosystem science to name the big concepts (probably global nutrient cycles and controls of productivity globally belong in the 1950s to 2050s key concepts category, but I don’t have anything intelligent to say on them). And the same for DNA barcoding as a tool for the 21st century.

Other omissions are more intentional. For example, a lot of energy has gone into niches, the diversity stability debate, disturbance ecology, population cycles, and etc but I’m not sure they have yet earned their keep as key concepts that have fundamentally changed our view of ecology (which is not to say they couldn’t still do it).  I’d be curious to hear other nominations for this category in the comments (or disagreement with my intentional omissions). I’ve left out some obvious tools too. For example, I didn’t mention a single statistical method as a key tool. This probably shouldn’t surprise readers who know I’m a pretty firm believer that ecology should be in the driver’s seat and statistics is just a tool. The same with big data. Any number of topics I mentioned above intersect with big data (all the way back to von Humbold and Kleiber!). But I just don’t see discussion of big data in isolation as a useful way forward – big data is just a tool letting us finally crack controls of species richness, biodiversity trends, etc.  I could have suggested the computer as a key tool, for it certainly has been – it has enabled larger data, much better statistics, null models, complex simulations, not to mention big data and etc. But it’s a little trite and too general to list here.

What do you think? What is missing from my list? What should I have left off? Answer in the comments or Tweet with #ESA100.

Why are your statistical models more complex these days?

I serve on a lot of graduate committees. I also end up as statistical consultant for a lot of students and faculty. So I see a lot of people thinking through their models and statistical approaches.

One trend I am noticing is that most people are staying within the linear framework (e.g. not GAM or regression tree), but their models are becoming increasingly complex. That is to say more and more terms are being included. And they are more and more of what I would call blocking/nuisance terms. I’m not talking about “big data” or exploratory or data mining approaches where people have dozens of potential variables and no clue which to use.

I’m talking about traditional field experiments or behavioral/physiological observations of individuals or small scale observational studies. And I’m noticing in the dark ages (=when I was in graduate school=11 years ago)  there would be a 2 or at most 3-way ANOVA with maybe one or two interaction terms. Now everybody is running multivariate regressions or more often mixed effect models. And there are often 4-5 fixed effects and 3 or 4 (often with nesting) random effects and many more interaction terms and even sometimes people want/try to look at interaction terms among random effects (an intensely debated topic I am not going to weigh in on).

As one example – in the past I almost never saw people who collected data over two or three years (i.e. all PhD programs and grants) include year as an explanatory factor (fixed or random) unless there was really extreme variability that got turned into a hypothesis (e.g. El Nino vs La Nina which happened not infrequently in Arizona). Now everybody throws in year as an explanatory factor even when they don’t think there was meaningful year-to-year variability.

And for what it’s worth, putting even two crossed (as opposed to nested) random factors into the lme command in the nlme R package was somewhat arcane and of mixed recommendability, while crossed random effects are easily incorporated in the newer lmer command in lme4. So it might just be evolving software, but I don’t really believe software capacity alone is driving this because  I’m also seeing the number of fixed factors going up and I never used to hear people complaining about it being hard to include 2 crossed random factors in lme. But it does prove the complexity of models has gone up since the models I see as common place today weren’t even supported by the software 3-5 years ago.

Now I am on record that the move to multivariate regression framing instead of ANOVA is a very good thing. And I haven’t said it in this blog but every time I teach mixed effect models I say they’re one of the most important advances in statistics for ecology over the last couple of decades. So I’m not critiquing the modelling tools.

But I am suspicious of the marked increase of the number of factors from approximately 2-3 with few interactions to 4-8 with many interactions (and again this is not in an exploratory framework with dozens of variables and something like regression trees). I’m a notorious curmudgeon, suspicious of any increase in statistical complexity that is not strongly justified in changing our ECOLOGICAL (not statistical interpretations). But I’m clearly out of the mainstream. And although I can say some particular specific practices or motives around complex models are wrong, I cannot say that more complex models in general are wrong. So maybe I’m missing something here.

So please enlighten me by taking the below poll on why you think models have become more complex over the last 10 years. You can check up to six boxes but 2-4 is probably more informative.

(I am going to offer my own opinions in a future blog post but I don’t want to bias the poll because I am really genuinely curious about what s driving this phenomenon – and by the same token I’m not going to be active in the comments on this post but hope you are).

Do you have any good examples where your ecological understanding was greatly increased by 4-8 factors instead of 2-3? Do you have an example of a  killer interpretation of 4 factors in one model? Do you think you’re  still in a hypothesis testing framework when you have, for example, 5 fixed factors and three random factors? What about if you’ve done some model comparisons to get from 5 fixed/3 random down to 3 fixed/2 random?

What math should ecologists teach

Recently Jeremy made the point that we can’t expect ecology grad students to learn everything useful under the sun and asked in a poll what people would prioritize and toss. More math skills was a common answer of what should be prioritized.

As somebody who has my undergraduate (bachelor’s) degree in mathematics I often get asked by earnest graduate students what math courses they should take if they  want to add to their math skills. My usual answer is nothing – the way math departments teach math is very inefficient for ecologists, you should teach yourself. But its not a great answer.

In a typical math department in the US, the following sequence is the norm as one seeks to add math skills (each line is a 1 semester course taken roughly in the sequence shown)

  1. Calculus 1 – Infinite series, limits and derivatives
  2. Calculus 2 – Integrals
  3. Calculus 3 – Multivariate calculus (partial derivatives, multivariate integrals, Green’s theorem, etc)
  4. Linear algebra – solving systems of linear equations, determinants, eigenvectors
  5. Differential equations – solving systems of linear differential equations, solving engineering equations (y”+cy=0)
  6. Dynamical systems – yt+1=f(yt) variations including chaos
  7. Probability theory (usually using measure theory)
  8. Stochastic processes
  9. Operations research (starting with linear programming)

That’s 7 courses over and above 1st year calculus to get to all the material that I think a well-trained mathematical ecologist needs! There are some obvious problems with this. First few ecologists are willing to take that many classes. But even if they were, this is an extraordinary waste of time since over half of what is taught in those classes is pretty much useless in ecology even if you’re pursuing deep into theory. For example – path and surface integrals and Green’s theorem is completely irrelevant. Solving systems of linear equations is useless. Thereby making determinants more or less useless. Differential equations as taught – useless (to ecologists very useful to physicists and engineers). Measure-based probability theory – useless. Linear programming – almost useless.

Here’s my list of topics that a very well-trained mathematical ecologist would need (beyond a 1st year calculus sequence):

  1. Multivariate calculus simplified (partial derivatives, volume integrals)
  2. Matrix algebra and eigenvectors
  3. Dynamical systems (equilibrium analysis, cycling and chaos)
  4. Basic probability theory and stochastic processes (especially Markov chains with brief coverage of branching processes and master equations)
  5. Optimization theory focusing on simple calculus based optimization and Lagrange multipliers (and numerical optimization) with brief coverage of dynamic programming and game theory

Now how should that be covered? I can see a lot of ways. I could see all of that material covered in a 3 semester sequence #1/#2, #3, #4/#5 if you want to teach it as a formal set of math courses. And here is an interesting question. We ecologists often refuse to let the stats department teach stats to our students (undergrad or grad)  because we consider it an important enough topic we want our spin on it. Why don’t have the same feelings about math? Yet as my two lists show math departments are clearly focused on somebody other than ecologists (mostly I think they’re focused on other mathematicians in upper level courses). So should ecology department start listing a few semesters  of ecology-oriented math on their courses?

But I could see less rigorous, more integrative ways to teach the material as well. For example, I think in a year long community ecology class you could slip in all the concepts. Dynamical systems (and partial derivatives) with logistic/ricker models and then Lotka-Volterra. Eigenvectors and Markov Chain’s with Horn’s succession models or on age-stage structure, then eigenvectors returning as a Jacobian on predtor-prey. Master equations on Neutral Theory. Optimizaiton on optimal foraging and game theory Yes the coverage would be much less deep than a 3 semester sequence of math only courses, but it would, I think, be highly successful.

I say “I think” because, I don’t know anywhere that teaches the math this way. I teach a one semester community ecology grad class and try to get a subset of the concepts across, but certainly don’t come anywhere close covering everything that I wish were covered (i.e. my list above). And I know a lot of places have a one-semester modelling course for grad students. But teaching their own math courses, or teaching a math-intensive ecology sequence I haven’t come across.

What do you think? Have I listed too much math? or left your favorite topic out? How should this be taught? How many of our students (undergrads, just all grads, only a subset of interested grads) should this be taught to?.

Detection probability survey results

Last week, I highlighted some new results from a paper on detection probabilities and placed detection probabilities in the context of estimator theory. This in turn led to a a reader poll to try to get a sense of how people thought about experimental design with detection issues.

Although I don’t want to spend too much time on it here, I wanted to briefly highlight a great paper that just came out “Assessing the utility of statistical adjustments for imperfect detection in tropical conservation science” by Cristina Banks-Leite and colleagues. They look at several real world scenarios focused on identifying covariates of occupancy (rather than absolute occupancy levels) and show the results are not much different with or without statistical adjustment. They draw a distinction between a priori control for covariates of detection probability in setting up a good study design vs a posteriori statistical control for detection probability and point out that both are valid ways of dealing with detection issues. The take home quote for me was “We do not believe that hard-won field data, often on rare specialist species, should be uniformly discarded to accord with statistical models”. Whereas my last post was very theoretical/statistical this paper is very grounded in real-world, on-the ground conservation, but in many ways makes many of the same points. It is definitely worth a read.

Turning now to the survey … at the time of analysis Wednesday morning there were 168 respondents. You can view the raw results here. There was a reasonably good cross section of career stages and organisms represented although the employment sector skewed very heavily to university. And of course “readers of a blog who chose to respond to a poll” is in no way a scientifically designed sample. If I had to speculate this particular post attracted a lot of people interested in detection probabilities, but what exact bias that would result in is hard to predict.

Recall I presented two scenarios. Scenario A was to visit 150 sites once. Scenario B was to visit 50 sites 3 times each. The goal was to estimate how occupancy varied with four collinear environmental variables.

Probably the lead result is the recommended scenario:

detprob_advice

Scenario B (50 sites 3 times) was the most common recommendation but it by no means dominated. Over 10% went for scenario A outright. And 20% noted that choosing required more information – with most people saying the critical information was more knowledge about the species – well represented in this quote on what the choice would depend on: “A priori expectation of potential for detection bias, based on species biology and survey method.”. It should be noted that a non-trivial fraction of those who went for B did it not to support detection probabilities but for reasons of sampling across temporal variability (a goal that is contradictory with detection probability modelling which assumes constant conditions and even constant individuals across the repeat visits). 17% also went for B but with hesitation (either putting statistical expertise of others over their own field intuition or else feeling it was necessary to publish).

There was a trend (but definitely not statistically significant) for more graduate students to recommend B and more senior career people (while still favoring B) to switch to “it depends”. Similarly there was a non-significant trend for people who worked on vertebrates to favor B and for people who worked on plants and inverts to switch a bit to scenario A (with scenario B still a majority).

Quite a few people argued for a mixed strategy. One suggestion was to visit 100 sites with 2 repeat visits to 25 of them. Another suggested visiting 25 sites 3 times, then making a decision how to proceed. And there were quite a few variations along this line.

The story for my question about whether there was pressure or political correctness to use detection probabilities was similar (not surprisingly). There was a weak trend to yes (mean score of 3.09) but not significant (p=0.24). Graduate students were the most likely to think there was PC-ness and senior career people the least likely. People working in verts and plants were more likely to see PC-ness than people working on inverts (again all non-significant).

So the overall pattern is a lean to scenario B but a lot of diversity, complexity and nuance. And not much if any perception of PC-ness around having to use detection probabilities ON AVERAGE (some individuals felt rather strongly about this in both directions).

In short, I think a majority of respondents would have agreed with this quote from one respondent:  “… the most important part of study design is…thinking. Each situation is different and needs to be addressed as a unique challenge that may or may not require approaches that differ from those used in similar studies.” Which nicely echoes the emphasis in this blog on the need to think and not just apply black and white universal rules for statistics and study design.