About Brian McGill

I am a macroecologist at the University of Maine. I study how human-caused global change (especially global warming and land cover change) affect communities, biodiversity and our global ecology.

What can a journal Editor-in-Chief do to attract you to submit to the journal? a poll

As briefly mentioned previously in this blog, I have accepted the position of Editor in Chief (EiC) at Global Ecology and Biogeography. I, of course, think it is a fantastic journal (objectively it ranks top in my field and top 10 in all of ecology) thanks to the great work of outgoing EiC David Currie. As you might imagine taking on this new role and my ensuing contract negotiations with the journal owner (Wiley) have caused me to think a lot about exactly what the job of EiC should entail. This is a question of current relevance not just to me but all of ecology and science; the world of academic publishing is changing so quickly that everything in academic publishing is being rethought these days, including the role of EiC. The recently announced movement of the ESA journals to Wiley is a case in point. While this will not result in significant changes to the editorial staff or processes, anytime there is such major institutional change, roles and expectations will be revisited. I expect many of you have thought little if at all about the EiC at journals, but I am incented to provoke you to think about it and am curious to hear your thoughts.

First a quick review for those less familiar with publishing (skip to the poll if you know all of this). Journals typically have an EiC and a panel of associate or handling editors (hereafter AE). The typical flow is:

  1. A paper is submitted electronically
  2. EiC evaluates the paper for quality and goodness of fit and either issues an editorial reject without review or assigns it to a handling editor. These days the EiC editorial rejects 30-90% of all the submitted manuscripts with 50% being a quite typical number (publishing hint: cover letters didn’t use to matter much, but they are now critical in making it past this first screen)
  3. If the EiC decides to send it to review, s/he assigns it to a specific AE (publishing hint: recommending AEs who are expert in the topic of the paper is helpful, but the EiC knows the AEs quite well so this is not particularly subject to gaming – further hint: your letter better be more snappy than your abstract, not just a rehash because that is the one other thing they will read).
  4. The AE may choose to recommend editorial reject without review as well, although typically this is much rarer than the EiC doing it (but maybe 5-10% of all submissions).
  5. The AE provides a list of 5 or so potential reviewers (publishing hint: this is critical  to the ultimate decision, but I have no clue on how to game this aspect of who gets picked as reviewers – I don’t think it can be gamed).
  6. An editorial assistant, increasingly often based at the publisher’s office, will contact the prospective reviewers until (usually) 2 people say yes. Sometimes it may take asking as much as 10-15 people (especially in the middle of the summer). In my experience getting reviewers to say yes has nothing to say about the quality of the paper – so don’t take it as a bad sign if you get a note saying there have been delays in finding reviewers.
  7. Once the reviews are back, the admin will contact the AE to submit a recommendation.
  8. The AE will read the recommendations and should read the paper in full and then make a recommendation (the dramatic accept/major/minor revision/reject that everybody pays attention to, but also a summary of the reviews and a focused list of the most important, must have changes that you should pay a lot of attention to).
  9. The recommendation then goes to the EiC who makes a final decision. Many EiC follows the AE recommendations unless there are serious red flags, but a few insert their own evaluations into the process.

Some journals also have Deputy EiC – and at some journals these DEiC effectively act like fancy AEs while at other journals they are effectively co-EiC. Journals also have a managing editor who is responsible for the business side. In most society journals the managing editor reports to the society, but in journals owned by the publishing company the managing editor is part of the publishing company.

So, everybody who has ever submitted a paper is likely pretty clear on the roles of the reviewers and the AE. What exactly does the EiC do or should they do? I have my own opinions, which I will review in a few days in the comments, but I am curious as a reader and author, what is most important to you that the EiC devotes her/his energies on? (everything in the poll below is a job of the EiC, but obviously some are more important than others) So to put it another way, which features would make you more likely to submit to a journal if you know the EiC was prioritizing time on them?

Please take the poll below. (Note: mss=manuscripts)

Different types of hands-on projects in a natural history course

In what seems to be becoming my annual post on help me think out loud about my fall teaching assignment (see last year’s post on community ecology classes), I am thinking about a field-oriented natural history course I’ll be teaching this fall and what assignments/evaluation tools I should use. Or more broadly, you hear a lot about best pedagogical approaches to classroom learning (including many great posts from Meg), but less about outdoor pedagogy. I think we all think since we’re ecologists this is obvious. Or maybe outdoor learning is so obviously active-learning, project-based, real-world etc which is what we’re trying to bring into classrooms that we don’t have to worry about it. But really, outdoor pedagogy is pretty much teach as we’ve been taught every bit as much as classrooms have been. I’ve increasingly been appreciating how much deep thinking is required to really get pedagogy right, and since I’m taking over a field course, I’ve been thinking a lot about my goals and how to align them with teaching and evaluation tools outdoors. I’d be really curious to hear your thoughts.

To make this concrete will still keeping this fairly generic, I am looking at a natural history course, the center-piece of which is multiple half-day field trips to a variety of ecosystems, and I am looking for an integrative project that spans the semester. I am considering three different projects:

  1. Do your own research/experiment – this is fairly typical in the OTS type model where you are spending a few weeks at a field station (I’ve taught such a course myself). Here you mentor students through the process of designing, executing and writing up a discrete piece of novel research. Pros – this teaches the scientific method and is fairly open-ended and clearly requires stretching their critical thinking skills and at least one form of writing. Cons – Many students aren’t really ready to do independent research as undergrads (especially lower level) and so often find this assignment more frustrating/intimidating than inspiring and in some cases do such low level work I’m not sure they learn much (or worse learn a very simplistic view of science), and its not particularly integrative (i.e. good at teaching scientific method, bad at helping students make connections and insights in natural history)
  2. Do a digital specimen collection – this is also a fairly typical assignment in “ology” classes (I did one myself in my graduate days in an entomology class). Since my class cuts across many taxa (requiring many types of collection equipment) I would probably have this be a digital collection instead of a physical collection where students take photos, put them into a document and annotate each photo with species ID, location, and notes about the species. Pros – this reinforces the goal of learning to ID species, paying attention while outdoors and seems likely to be retained as a tool useful to students after they graduate. Cons – less integrative than the other two choices, although this comes down a lot to what and how much I make them write in addition to the photos.
  3. Write a natural history journal – I haven’t encountered this one as much but a colleague suggested it. The assignment basics would be: 1) pick a small piece of land, 2) study it in depth from the soil to the sky, 3) make repeated visits, 4) write 5 pages about this location and its dynamics and interconnections in the spirit of Thoreau or McPhee. Pros – very integrative, very open-ended, a lot of emphasis on writing which is good (although like most biology courses we’re not really set up to do extensive mentoring on writing). Cons – pretty risky to expect students to observe and write like Thoreau.

There are some course-specific constraints in my own mind for my personal situation (although I think they’re not untypical of many teaching situations): this is lower division undergraduate (200-level), largish (44 students vs 2 instructors) course so more limited opportunities for mentorship than ideal. It is not a 2-week at a field station type of course so students will be doing this assignment very independently on their own time in the business of the semester (or not doing it until the last minute in some cases). The course is also literally focused on natural history, not principles of ecology or such (we have a separate ecology class). You can, of course, share your thoughts in the context of these constraints or I would be equally interested to hear your thoughts about the three options in your own context.

I personally have two main goals for this assignment: 1) is to be integrative. By integrative I mean they will already have lab exams on species ID and lecture exams on the stages of old-field succession etc. I really want something different that makes them think big picture, have ah-ha moments of connection and develop critical thinking and writing skills in addition to memorization. 2) is just to have fun and inspire. It is shocking how little time the average ecology student spends outdoors in their 4 years (forestry and wildlife do a little better than biology departments but still not great). This is likely to be their primary exposure to in-the-field until their 4th year for many students. I want them feel inspired by the awesomeness of nature that made us all go into the subject (while still being able to evaluate learning and give grades).

How important do you think these goals are? Do you think these assignments meet these goals? Any tips or gotchas you’ve learned the hard way on any of these projects?Other goals are imaginable and I’d be curious to hear them. Of course I’d be curious to hear other suggestions for assignments to meet these goals too.What do you see as the relative merits of these three projects? What do you think should be the primary pedagogical goals in a course that represents many students first exposure to the wonders of nature in a hands-on fashion? More broadly is pedagogy for outdoor teaching easy, or do we need to rethink this too?

 

Big data doesn’t mean biggest possible data

For better and for worse, big data has reached and is thoroughly permeating ecology. Unlike many, I actually think this is a good thing (not to the degree it replaces other things but to the degree to which it becomes an another tool in our kit).

But I fear there is a persistent myth that will cripple this new tool – that more data is always better. This myth may exist because “big” is in the name of the technique. Or it may be an innate human trait (especially in America) to value bigger house, car, etc. Or maybe in science we are always trying to find simple metrics to know where we rank in the pecking order (e.g. impact factors), and the only real metric we have to rank an experiment is its sample size.

And there is a certain logic to this. You will often hear that the point of big data is that “all the errors cancel each other out”. This goes back to statistics 101. The standard error (description of the variance in our estimate of the mean of a population) is \sigma/\sqrt{n}. Since n (sample size) is in the denominator the “error” just gets smaller and smaller as n gets bigger. And p-values get corresponding closer to zero which is the real goal. Right?

Well, not really. First \sigma (standard deviation of noise) is in the numerator. If all data were created equally, \sigma shouldn’t change too much as we add data. But in reality there is a lognormal-like aspect to data quality. A few very high quality data sets and many low quality data sets (I just made this law up but I expect most of you will agree with it). And even if we’re not going from better to worse data sets, we are almost certainly going from more comparable (e.g. same organisms, nearby locations) to less comparable organisms. The fact that noise in ecology is reddened (variance goes up without limit as temporal and spatial extent increase) is a law (and it almost certainly carries over to increasingly divergent taxa although I don’t know of a study of this). So as we add data we’re actually adding lower quality and/or more divergent data sets with larger and larger \sigma. So \sigma/\sqrt{n} can easily go up as we add data.

But that is the least of the problems. First, estimating effect size (difference in means or slopes) is often only one task. What if we care about r2 or RMSE (my favorite measures of prediction. These have sigma in the denominator and numerator respectively so the metrics only get worse as variance increases.

And then there is the hardest to fix problem of all – what if adding bad datasets adds bias. Its not too hard to imagine how this occurs. Observer effects is a big one.

So more data definitely does NOT mean a better analysis. It means including datasets that are lower quality and more divergent and hence noisier and problably more biased.

And this is all just within the framework of statistical sampling theory. There are plenty of other problems too. Denser data (in space or time) often means worse autocorrelation. And another problem. At a minimum less observation effort produces smaller counts (of species, individuals, or whatever). Most people know to correct for this crudely by dividing by effort (e.g. CPUE is catch per unit effort). But what if the observation is non-linear (e.g. increasing decelerating function of effort as it often is). Then dividing observed by effort will inappropriately downweight all of those high effort datasets. Another closely related issue that relates to non-linearity is scale. It is extremely common in meta-analyses and macroecology analyses to lump together studies at very different scales. Is this really wise given that we know patterns and processes often change with scale. Isn’t this likely to be a massive introduction of noise?

And it goes beyond statistical framing to inferential framing to what I think of as the depth of the data. What if we want to know about the distribution of a species. It seems pretty obvious that measuring the abundance of that species at many points across its range would be the most informative (since we know abundance varies by orders of magnitude across a range within a species). But that’s a lot of work. Instead, we have lots of datasets that only measure occupancy. But even that is quite a bit of work. We can just do a query on museum records and download often 100s of presence records in 15 minutes.But now we’re letting data quantity drive the question. If we really want to know where a species is and is not found, measuring both sides of what we’re interested in is a far superior approach (and no amount of magic statistics will fix that). The same issues occur with species richness. If we’re really serious about comparing species richness (a good example of that aforementioned case where the response to effort is non-linear), we need abundances to rarify. But boatloads of papers don’t report abundances, just richness. Should we really throw them all away in our analyses?

As a side note, a recurring theme in this post and many previous ones is that complex, magic statistical methods will NOT fix all the shortcomings of the data. They cannot. Nothing can extract information that isn’t there or reduce noise that is built in.

So, returning to the question of two paragraphs ago, should I knowingly leave data on the table and out of the analysis? The trend has been to never say no to a datset. To paraphrase a quote from Will Rogers, “I never met a dataset I didnt’ like”. But is this the right trend? I am of course suggesting it is not. I think we would be better off if we only used high quality datasets that are directly relevant to and support the necessary analytical techniques for our question. Which datasets should we be omitting? I cannot tell you of course. You have to think it through in the particulars. But things like sampling quality (e.g. amount of noise, quality control of observation protocols), getting data that make apples to apples comparisons, and the depth of the data (e.g. abundance vs occupancy vs presence/absence) may well place you in a realm where less is more!

What do you think? Have you had a situation where you turned away data?

Steering the trait bandwagon

Although the notion “bandwagon” technically only means something that is rapidly growing in popularity, calling a scientific research program a bandwagon  carries several more connotations. These include the idea that it will crash (people will abandon it) and that people are piling in because they perceive the research program as a way to do something that is “easy” (or even formulaic) but still get in a good journal (i.e. the proverbial something for nothing). Popular and easy are of course two of the worst reasons to choose a research project, but that seems not to matter in the bandwagon phenomenon.

There is little doubt that functional traits are a bandwagon research program right now:

Papers using "functional trait*"

Papers using “functional trait*” per year

The use of the phrase “functional trait*” (per Web of Science) is rising exponentially with a doubling time of less than 4 years. In less than two decades, there are almost 3000 total publications cited 56000 times, 14000 times last year alone (with an astonishing average citation rate of 19 times/article and an h-index for the field of over 100).

For better and worse, I am probably one of a fairly large group of people responsible for this bandwagon due to this paper which came out simultaneously with a couple of other papers arguing for a trait based approach, although (as likely true of all bandwagons)  the idea has been around much longer and builds on the research of many people.

By calling functional trait research a bandwagon, I am implying (and now making explicit) two things: 1) The popularity of the functional trait program is in part due to the fact that people see it as a simple way to do something trendy. I think there is no doubt of this – there are a lot of papers being published right now that just measure a bunch of functional traits on a community or guild and don’t do much more. 2) That this party is about to come to an end. I predict we will see multiple papers in the next two years talking about how functional trait research is problematic and has not delivered on its promise and many people bailing out on the program.

You might think I am worried about the impending crash, but I am not. I actually relish it. Its after the bandwagon crashes that we lose all the people just looking for a quick paper and the people who are really serious about the research field stay, take the lessons learned (and identify what they are), build a less simple, more complex but more realistic, productive world view. In my own career I have seen this with phylogenetic systematics, neutral theory of biodiversity, and – if we go back to my undergraduate days – neutral theory of genetics and island biogeography.

In an attempt to shorten the painful period and hasten the renewal, what follows are my ideas/opinions about what is being ignored right now on the functional trait bandwagon (although by no means ignored by the researchers I expect will still hang around after the crash and I have tried to give citations where possible), which I predict will become part of the new, more complex view of functional traits version 2.0 in 5-10 years down the road.

(As an aside – I wanted to briefly note as a meta comment on how I think science proceeds, that: a) I think probably many other people are thinking these thoughts right now – they’re in the air, but as far as I know nobody has put them down as a group in ink (or electrons) yet and b) my own thinking on this has been deeply influenced by at least a dozen people and especially by Julie Messier as well as Brian Enquist & Marty Lechowicz – more full acknowledgements are at the bottom c) its not as easy to assign authorship on these thought pieces as it is on a concrete piece of experiment or analysis – if this were a paper I could easily argue for just myself as author or 1 more or 3 more or 10 more)

So without further ado, here are 9 things I think we need to change to steer the bandwagon:

  1. What is a trait? – there are a lot of definitions (both the papers linked to above have them). But the two key aspects are: 1) measured from a single individual and 2) conceivably linked to function or performance (e.g fitness or a component such as growth rate). The 2nd is not a high bar to clear. But a lot of people right now are ignoring #1 by taking values that can only be tied to a species or population (such as population growth rate, geographic range size, mortality rates) and calling them functional traits. They’re not. They’re important and interesting and maybe science will someday decide they’re more important than things you can measure on individuals. But they’re not functional traits if you can’t measure it on one individual. The functional trait program is going from function (behavior and physiology) to communities or ecosystem properties. Its where a lot of the excitement and power of the idea comes from. It is actually in a subtle way a rejection of the population approach that dominated ecology for decades.
  2. Where’s the variance? – I believe that the first step in any domain of science is to know at what scales and levels of measurement variation occurs. Only then can you know what needs to be explained. There has been an implicit assumption for a long time that most of the variance in functional traits is between species and/or along environmental gradients. There is indeed variation at these two levels. But there is also an enormous amount of variation between individuals in the same species (even population). And there is way more variation between members of a community than between communities along a gradient. Finally, although the previous statements are reasonably general, the exact structure of this variance partitioning depends heavily on the trait measured.  Functional traits won’t deliver as a field until we all get our head around these last three facts. And learn a lot more than we already know about where the variance is. A good intro to this topic is Messier et al 2010 and Viollet et al 2012 (warning I’m a coauthor on both).
  3. Traits are hierarchical (can be placed on scale from low level to high level) – we tend to lump all traits together, but traits are hierarchical. Some are very low level (e.g. chlorophyl concentration per leaf volume), one level up (e.g. light absorption), and going on up the ladder from this one trait we have Amax (maximum photosynthetic rate), leaf CO2 fixation/time, CUE (or carbon use efficiency or assimilation over assimilation+respiration), plant growth rate, and fitness. Note that each trait directly depends on the trait listed before it, but also on many other traits not listed in this sequence. Thus traits are really organized in an inverted tree and traits can be identified at any tip or node and performance sits at the top of the tree. We move from very physiological to very fitness oriented as we move up the tree. One level is not more important than the other but the idea of different levels and being closer to physiology or closer to fitness/performance is very real and needs to be accounted for. And we need to pick the right level for the question. All traits are not equivalent in how we should think about them! And learning how to link these levels together is vital. A depressing fact in phenotypic evolution is that the higher up the hierarchy a phenotypic character is, the less heritable it is (with fitness being barely heritable), but so far we seem to be having the opposite luck with functional traits – higher level traits covary more with environment than low level traits (there are a lot of good reasons for this). A good intro paper to this topic is Marks 2007.
  4. Traits aren’t univariate and they’re not just reflections of 1-D trade-offs – How many papers have you seen where trait #1 is correlated with environment. Then trait #2 is correlated with environment, and etc.? This is WRONG! Traits are part of a complex set of interactions. If you’re a geneticist you call this epistasis and pleiotropy. If you’re a physiologist you call this allocation decisions (of resources). If you are a phenotypic evolution person you call this the phenotypic covariance matrix. Of course we are finding that one trait low in the hierarchy is neither predictive of overall performance nor strongly correlated with environment. It is part of an intricate web – you have to know more about the web. The main response to this has been to identify trade-off axes. The most famous is the leaf economic spectrum  (LES) which basically an r-K like trade-off between leaf life span and rate of photosynthesis. Any number of traits are correlated with this trade-off (e.g. high nitrogen concentrations are correlated with the fast photosynthesis, short life end). And several of the smartest thinkers in traits (e.g. Westoby and Laughlin) have suggested that we will find a handful of clear trade-off axes. I hate to contradict these bright people, but I am increasingly thinking that even the idea of multiple trade-off axes is flawed. First the correlations of traits with the LES are surprisingly weak (typically 0.2-0.4). Second, I increasingly suspect the LES is not general across all scales. And the search for other spectra have gone poorly. For example, despite efforts, there has not yet emerged a clear wood economic spectrum that I can understand and explain. So to truly deal with traits we need to throw away univariate and even trade-off axes and start dealing with the full complexity of covariance matrices. This is complex and unfortunate, but it has profound implications. Even the question of maintenance of variation simplifies when we adopt this full-blown multivariate view of phenotype (two nice papers by Walsh and Blows and Blows and Walsh). For a good review of the issue in traits see the newly out just this week in TREE Laughlin & Messier
  5. Any hope to predict the performance consequences of traits requires dealing with the TxE (traitXenvironment) interaction – Does high SLA (specific leaf area or basically thinness of leaf, a trait strongly correlated with the rapid photosynthesis end of the LES) lead to high or low performance? The answer blatantly depends on the environment (e.g. causes lower performance in dry environments or environments with lots of herbivory). Too many studies just look at trait-performance correlations when they really need to look at this in a 3-way fashion with performance as a 3-d surface over the 2-D space of trait and environment. Presumably this surface will usually be peaked and non-linear as well (again see Laughlin & Messier 2015)
  6. Theory – the field of functional traits is astonishingly lacking in motivating theory. When people tell me that natural history or descriptive science is dead, I tell people its just been renamed to functional traits. I personally see descriptive science as essential, but I also see theory and the interaction between theory and description as essential. Key areas we need to develop theory include:
    1. How exactly filtering on traits works – one of the appealing concepts of traits is that we can move from simply saying a community is a filtered set of the species pool to talking about what is being filtered on. But we aren’t thinking much about the theory of filtering. Papers by Shipley et al 2006 and Laughlin et al 2012 are good starts but not referenced by most workers in the field. And nowhere have we got a theory that balances the environmental filter that decreases variance with the biotic competition filter that increases variance within a community (and yes Jeremy, other possibilities are certainly theoretically possible per Mayfield & Levine 2010, but for good empirical reasons, I believe this is the main phenomenon happening in traits).
    2. What is the multivariate structure of trait covariance – This is partly an empirical question but there are many opportunities for theory to inform on this too. In part by thinking about …
    3. Causes of variation – we know variation in traits are due to a combination of genetic variation and adaptive plasticity and that these respond to environments at many scales. But can we say something quantitative?
  7. Individuals – we are very caught up in using traits as proxies for species but I increasingly think that filtering happens on the individual level and that we need to shift away from thinking about traits at the species level. The same given trait value (say the optimal value in some environment) can be provided by any of several species, each species of which shows consider variability in traits and therefore having significant overlap in the trait distributions between species.This idea can be found in Clark 2010 and Messier et al 2010 among many others. This might seem subtle, but it is a pretty radical idea to move away from populations to individuals to understand community structure.
  8. Interaction traits, reproduction traits and other kinds of traits – most of the traits studied are physiological/structural in nature. This is probably because one of the major roots of functional traits has been seeking to predict the ecosystem function of plants (e.g. CO2 fixation, water flux). But if we are going to develop a fully trait-based theory of ecology we need to address all aspects of an organism including traits related to species interactions (e.g. root depth for competition, chemical defenses for herbivory, floral traits for pollination and reproduction, and even behavioral traits like risk aversion).
  9. Traits beyond plants – the trait literature is dominated by botanists. There is a ton of work in the animal world that deals with morphology and behavior. And some of it is starting to be called “functional traits.” The hegemony of one term is not important, but the animal and plant people thinking about these things (whatever they’re called) need to spend more time communicating and learning from each other.

So there you have it. If you want to predict outcomes (e.g. invasion, abundance, being found at location X or in environment Y, and etc) based on traits, its easy. You just have to recognize that it happens in interaction with the environment and many other traits (many of which we haven’t even started studying) and figure out what the appropriate level of traits to study for the scale of the question. Sounds easy right? No, of course not. When is good science ever easy? That’s the problem with bandwagons. Anybody want off the trait bandwagon before we get to that destination? Anybody want on if they know that is the destination?

What do you think? Are traits a bandwagon? Is it about to crash? What will be the story uncovered by those picking up the pieces? Anything I forgot? Anything I should have omitted?

PS – I don’t usually do acknowledgements on informal blog posts, but it is necessary for this one. My thinking on traits has been profoundly influenced by many people. First among them would be Julie Messier who is technically my student but I am sure I have learned more from her than vice versa. And she currently has shared with me several draft ms that make important progress on #2, #4 and #5. I also have to highlight my frequent collaborators, Marty Lechowicz and Brian Enquist. Also influencing me greatly at key points are Cyrille Violle, Marc Westoby, Evan Weiher. And this field is advancing by the work of many other great researchers (some of whom I’ve mentioned above) who were there before the bandwagon started (and many before I got on) and will still be there after it crashes but whom I won’t try to name for fear of leaving somebody out. Despite it being a bandwagon right now, there is no lack of smart people trying hard to steer constructively!

 

#EUMacro and macroecology reaching a self-conscious adolescence

I have been enjoying my attendance at the EU Macroecology 2015 meeting (a joint effort of the German, BES and Denmark macroecology groups). They have been having a great twitter feed at #EUMacro.

There have been tons of great talks, but I’m not going to turn this into a conference report blog (check out the twitter feed if you want that). Just three quick observations on conference life: 1) More Americans should go to European conferences and vice versa – very good work on both sides of the pond with not enough cross-fertilization of ideas; 2) I love meetings such as this one of about 100 people specialized on my subdiscipline – easily my favorite meetings, 3) Am I the only one who thinks its deeply unfortunate that we seem to have abandoned the tradition of putting citations on slides during talks to acknowledge prior work (even if there is not much time to talk about them)?

What I mostly want to do is build on 3 days of macroecology talks and time with colleagues and use it as an opportunity to reflect on the big picture trajectory of macroecology. Macroecology of course goes back as an effort without the name to the 1800s with von Humboldt and Darwin, through to MacArthur’s 1972 book. But macroecology was launched as a named, defined field in 1989 with Brown & Maurer’s paper. That makes macroecology 26 years old. My impression is that 26 years in the aging of a scientific discipline must equate to about 15 human years which places us squarely in adolescence.  In particular, a key feature of adolescence (scientifically documented  if your own high schools years are not enough) is acute (often painful) self-consciousness (constantly self examining and hyperaware of how others are perceiving us). This seems to me be what macroecology is doing as well right now.

Don’t get me wrong. While, self-examination can go too far (and be painful), I don’t sense that is what is happening. I think this is probably a natural progression of a field growing up. We’ve moved past those early years of just establishing that we exist in the world. There is no doubt that macroecology went through a phase of existential angst and trying to prove we deserved a seat at the ecology table but that is long gone. And the next stage of learning how to walk and talk certainly seems to have come and gone. We have books, journals, meetings, a common language. We’re really starting to do some impressive things (again see the twitter feed on #EUMacro).

Now we’re adolescents trying to mesh into the world and build a sound foundation for really world changing actions. So we’re very self-examining and aware of how we’re perceived externally. A few examples, I was part of a symposium at INTECOL that was focused on whether macroecology needs to be more process oriented. I was at a confab of theoretical macroecologists organized by John Harte a couple of years ago that concluded macroecology needed be able to deal with dynamics as well as static equilbria, in part motivated by a desire to be relevant to global change. These are recurring themes I’ve heard repeatedly since (and again here at EU Macro). And this year at EU Macro the two opening talks by Nick Isaac and Carsten Rahbek nicely presaged recurring themes in ensuing talks about the need to worry about data quality and the increasing focus on variance (the residuals around the line, not just the mean line). Linking across scales is another common theme which I’ve also seen here. In short a lot of ideas about how we need to do better, reach farther, etc.

Personally, I am less worried about process than many are (but more worried about prediction than most). I am not convinced we will ever be able to link across scales (specifically from populations to continental scales) like other people seem to wish. But I share most of the other goals. At least if kept in balance (too much obsession with data quality and residuals will turn us into field-based autoecologists, but we’re far from that now and I think a little bit of a pendulum swing is a good idea). And I really feel like to come of age we need to start making predictions of changes in response to perturbations. And we probably need to bring paleoecology into the fold to do that (also a theme at this conference).  But I’d be very curious to hear what others think.

In short, I think this self-examination is a very natural stage of development. And I think it is pretty productive and on-target so far. In short, I think macroecology is growing up nicely. I can’t wait to see what it does in its mature years!

Why AIC appeals to ecologist’s lowest instincts

It is my sense of the field that AIC (Akaike information criteria) has moved past bandwagon status into a fundamental and still increasingly used paradigm in how ecologists do statistics. For some quick and dirty evidence I looked at how often different core words were used at least once in an article in Ecology Letters in 2004 and 2014. Regression was used in 41% and 46% respectively. Significance was used in 40% and 35%. Richness was 41% and 33%. And competition was 46% and 49%. Perhaps a trend or two in there but all pretty steady. AIC has gone from being in 6% of the articles in 2004 to 19% of the articles in 2014. So in summary – AIC has tripled in usage and is now found in 20% of all articles and is used almost 2/3 as often as the mostly widely used statistical technique of significance..

I have a theory about why this has happened which does not reflect favorably on how AIC is used. Please note the qualification “how AIC is used”. AIC is a perfectly valid tool. And like so many tools, its original proponents made reasonable and accurate claims about it. But over time, the community takes ownership of a concept and uses it how they want, not how it was intended.

And I would suggest how people want to use AIC is in ways that appeal to two low instincts of ecologists (and all humans for that matter). First humans love rankings. Most newspapers contain the standings of all the teams in your favorite sport every day. We pay more attention of the rankings of a journal’s impact factor than its absolute value. Any number of newspapers produce rankings of universities. It is ridiculous to think that something as complex as journal quality or university quality can be reduced to one dimension (which is implicit in ranking – you can’t rank in two dimensions). But we force it on systems all the time. Second, humans like to have our cake and eat it too. Statistics have multiple modalities or goals. These include: estimation of parameters, testing of hypotheses, exploration of covariation, prediction into new conditions, selecting among choices (e.g. models) etc. Conventional wisdom is you need to be clearly based in one goal for an analysis. But we hate to commit.

You can probably already see where I’m headed. The primary essence of what AIC delivers is to boil choices down to a single dimension (precisely it provides one specific weighting of the two dimensions of likelihood and number of parameters to give a single dimension) and then ranks models. And comparing AIC scores is so squishy. It manages to look like all 5 statistical goals at once. It certainly does selection (that is its claim to fame). But if you’ve ever assessed whether ΔAIC>2 you have done something that is mathematically close to p>0.05.

Just to be clear, likelihood also can be used towards all those goals. But they present much more divergent paths. If you’re doing hypothesis testing you’re doing likelihood ratios. If you’re doing estimation you’re maximizing. If you’re doing selection you can’t proceed unless you specify what criteria to use in addition to likelihood. You have to actually slow down and choose what mode of inference you’re doing. And you have to make more choices. With AIC you present that classic table of ΔAIC and weights and voila! You’ve sort of implied doing all five statistical goals at once.

I want to return to my qualification of “how AIC is used”. The following is a simple example to illustrate how I perceive AIC being used these days. Take the example of species richness (hereafter S). Some people think that productivity is a good predictor (hereafter prod). Some people think seasonality is a better predictor (hereafter seas). Some people suggest energy is the true cause (hereafter energ). And most people recognize that you probably need to control for area sampled (area).Now you could do full blown variable selection where you try all 16 models of every possible combination of the four variables and using AIC to pick the best. That would be a pretty defensible example of exploratory statistics. You could also do a similarly goaled analysis of variable importance by scaling all four variables and throwing them into one model and comparing coefficients or doing some form of variance partitioning. These would also be true exploratory statistics. You could also use AIC to do variable importance ranking (compare AIC of S~prod, S~seas, S~energ). This is at least close to what Burnham and Anderson suggested in comparing models. You could even throw in S~area at which point you would basically be doing hypothesis testing vs a null although few would acknowledge this. But my sense is that what most people do is some flavor of what Crawley and Zuur advocate which is a fairly loose mix of model selection and variable seleciton. This might result in a table that looks like this*:

Model ΔAIC weight
S~prod+seas+area 0 31%
S~prod+energ+area 0.5 22%
S~prod+energ 1.1 15%
S~energ+seas 3.2 9%
S~energ 5.0 2%

There are a couple of key aspects of this approach. It seems to be blending model selection and variable selection (indeed it is not really clear that there are distinct models to select from here, but it is not a very clear headed variable selection approach either). Its a shame nobody ever competes genuinely distinct models with AIC as that was one of the original cliams to the benefit of AIC (e.g. Wright’s area energy hypothesis S~energ*area vs.the more individuals hypothesis a SEM with two equations:  S~numindiv and numindiv~prod). But I don’t encounter it too often. Also note that more complicated models came out ranked better (a near universal feature of AIC). And I doubt anybody could tell me how science has advanced from producing this table.

Which brings me to the nub of my complaint against AIC. AIC as practiced is appealing to base human instincts to rank and to be wishy washy about inferential frameworks.There is NO philosophy of science that says ranking models is important. Its barely better than useless to science. And there is no philosophy of science that says you don’t have to be clear what your goal is.

There is plenty of good debate to have about which inferential approach advances science the best (a lot has happened on this blog!). I am partial to Lakatos and his idea of risky predictions (e.g. here). Jeremy is partial to Mayo’s severe tests which often favors hypothesis testing done well (e.g. here). And I’ve argued before there are times in science when exploratory statistics are really important (here). Many ecologists are enamored with Platt’s strong inference (two posts on this) where you compare models and decisively select one. Burnham and Anderson cite Platt frequently as an advantage of AIC. But it is key to note that Platt argued for decisive tests where only one theory survives. And arguably still the most mainstream view in ecology is Popperian falsification and hypothesis testing. I can have a good conversation with proponents of any of these approaches (and indeed can argue for any of these approaches as advancing science). But nowhere in any of these approaches does it say keeping all theories around but ranking them is helpful. And nowhere does it say having a muddled view of your inferential approach is helpful. That’s because these two practices are not helpful. They’re incredibly detrimental to the advance of science! Yet I believe that AIC has been adopted precisely because they rank without going all the way to eliminating theories and because they let you have a muddled approach to inference.

What do you think? Has AIC been good for the advance of science (and ecology). Am I too cynical about why hordes are embracing AIC? Would the world be better off if only we went back to using AIC as intended (if so how was it intended)?

UPDATE – just wanted to say be sure to read the comments. I know a lot of readers usually skip them. But there has been an amazing discussion with over 100 comments down below. I’ve learned a lot. Be sure to read them.

*NB this table is made up. In particular I haven’t run the ΔAIC through the formula to get weights. And the weights don’t add to 100%. I just wanted to show the type of output produced.

Mistakes happen in science

Meg recently wrote a post acknowledging that crying in science was pretty common place. It really touched a nerve and went viral. Meg’s opening syllogism was masterful: humans cry, scientists are human, therefore scientists will cry.

I want to touch on an even more sensitive syllogism: humans make mistakes, scientists are human, therefore scientists will make mistakes. And a corollary – some mistakes will make it into print.

People obsessed with preserving a united front against science deniers might try to pretend this isn’t true. But it is true. This rarely acknowledged truth about scientists is fresh in everybody’s minds because of a recent retraction of an ecology paper (due to an honest mistake). I’m not even going to link to it since it is a distraction from my main point to single out one group of individuals when I’m talking about collective responsibility (but if its too distracting not to know Jeremy linked to it on Friday).

What I am finding revealing is not that a retraction occurred but other people’s reactions to the fact that a retraction occurred. There seems to be a lot of distancing and blaming. The first commentor on retraction watch even went one step further and very sloppily and inaccurately started throwing around the phrase “fraud scandal” (really? the topic of mistake is so taboo we can’t differentiate the profound difference between mistake and fraud?)

My reactions were rather different, going in order of occurrence, and probably progressively more profoundly were:

  1. Ouch – I feel bad for the authors
  2. I’m impressed with the way the authors handled this – it took a lot of courage
  3. That’s science working the way it is supposed to
  4. It could have been me

There’s no need to expand on the first one (except its worth noting I don’t know any of the author’s personally so this was more of a 1 degree removed member of my community form of empathy).

But I think it is worth dwelling on the second one for a moment. It must have been very tempting to bluster and deny that there were substantive enough mistakes to require a retraction and hoped this faded away. We all know this strategy has a decent shot at working. In an infamous case in evolution (UPDATE the link in Jeremy’s post is broken – follow this link), it worked for years until a co-author took it upon himself to self-publish and blow the whistle (nobody talks about this but the journals have an obvious interest in not highlighting a mistake). But these author’s didn’t weasel in any fashion. And they thought about the good of science before the good of their careers. Good for them!

As for the 3rd reaction – this is not a failure of science. It is a success of science! It is science working as it is supposed to. And it is exactly why science has a claim to a degree of rigor that other modes of thought don’t have. The reason my syllogism doesn’t eliminate science as a paragon of correctness is that – contrary to the popular view about lone geniuses  – science is not about individuals or single papers. It is about the community and the total body of evidence. One individual can be right, wrong, a crack-pot, a genius, mistaken, right for the wrong reasons, and etc. But the community as a whole (given time) checks each other and identifies wrong ideas and mistakes. The hive mind will get the important things right with some time. If you read the details, this is exactly what happened. Good for science!

The last reaction is the touchiest of all (it could have been me*). Of course I do not knowingly have any mistakes in print. But I could have a mistake out there I don’t know about. And I’ve caught some that came close. And I could make one in the future. Should I be thinking that? Should I be admitting that in a public blog? I sure hope your answer to both of these questions is yes. If I’m not asking the first quesiton (and admitting the possibility) how can I be putting my best effort into avoiding mistakes. The same for the community context. And I’m pretty sure any other honest scientist cannot say they are 100% sure they never had made a mistake and never will make a mistake. 95% sure – I hope so. Maybe even 99% sure. But 100% sure? I don’t trust you if that is what you claim. Every lab I’ve ever worked in or been close to (meaning dozens) have challenges and errors with data and coding and replicability of analysis. Most of them are discovered and fixed (or sadly prevent publication). But has anybody here ever run an analysis, gotten a particular t-statistic/p-value and written it up, and then run the analysis later and gotten a slightly different number and never been able to recreate the original? Anybody have one or two sample IDs that got lost in the shuffle and you don’t know what they are? These are admittedly small mistakes that probably didn’t change the outcome. But it is only a difference of degree. And I bet most of you know of bigger mistakes that almost got out the door.

I want to speak for a minute more specifically about coding. In this day and age nearly every paper has some coding behind it. It might just be an R script to run the analyses (and probably dropping some rows with incomplete data etc along the way). But it might be like the stuff that goes on in my lab including 1000+ line computer simulations and 1000+ line big data analysis. Software engineers have done a lot of formal analysis of coding errors. And to summarize a lot of literature, they are numerous and the best we can do is move asymptotically towards eliminating them. Getting rid of even 90-95% of the errors takes a lot of work..Even in highly structured anti-error environments like NASA or the medical field mistakes slip through (like the mis-transcribed formula that caused a rocket to crash). And science is anything but a highly-structured anti-error environment (and we shouldn’t be – our orientation is on innovation). In a future post, I will go through some of the tricks I use to validate and have faith in my code.. But that would be a distraction here (so you might want to save your comments on how you do it for that post too). The bottom line though is I know enough software engineering not to fool myself. I know there are errors in my code. I’ve caught a couple of one line mistakes that totally changed the results while I was in the middle of writing up my first draft.  I think and hope that the remaining errors are small. But I could be wrong. And if I am wrong and made a whopping mistake, I hope you find my mistake!

The software industry’s effort at studying errors was just mentioned. But the medical and airline industries have recently devoted a lot of attention to the topic of mistakes as well (their mistakes are often fatal).The Institute of Medicine released a report entitled “To Err is Hman” with this telling quote:

“.. the majority of medical errors do not result from individual recklessness or the actions of a particular group–this is not a “bad apple” problem. More commonly, errors are caused by faulty systems, processes, and conditions that lead people to make mistakes or fail to prevent them.”

Broad brushing the details, both medicine and the airlines have come to the conclusion that the best way to avoid mistakes are to 1) destroy the myth of infallibility, 2) eliminate the notion that raising the possibility of a mistake is offensive, 3) introduce a culture of regularly talking about the possibility of mistakes and analyzing mistakes made for lessons learned, and 4) make avoiding mistakes a collective group responsibility.

I think arguably science figured this all out a couple of hundred years ago. But it is worth making explicit again. And per #3 it is worth continuously re-evaluating how we’re doing. In particular we do #4 extremely well. We have peer review, post-publication review (which is stronger for prominent and surprising results), attempts at replication etc. We’re professional skeptics. We also do pretty well at #2; you expect and accept your work being criticized and picked apart (even if nobody enjoys it!). #1 is more of a mixed bag. I’ve heard a lot of “it could never happen in my lab” comments recently, which is exactly the myth of infallibility. And the same for #3 – I haven’t yet heard anybody say “I’m going to change X in my lab” in response to the recent incident. And more generally across #1-#4, I would suggest that coding is novel enough in ecology that we have not yet fully developed a robust set of community practices around preventing coding errors.

In conclusion, I am sure somebody is going to say I am glorifying mistakes in science. I’m not. Mistakes* are unfortunate and we all need to (and I think all do) put a lot of effort into avoiding them. But I sincerely believe there is no way to guarantee individual scientists do not make mistakes. At the same time, I also sincerely believe that a well constructed scientific community is robust enough to find and correct all important mistakes over time. Which means it really matters whether we respond to mistakes by finger pointing or examining our common culture and how to improve it. The later is the conversation I want to have.


*Probably important to reiterate here that I’m talking about mistakes, not fraud. Whole different kettle of fish. I presume most people can see that, which is why I am not belaboring it.

A curmudgeon’s musings on modern pedagogy

(warning this is long – you can skip to the conclusions or even bottom-bottom line at the end if you want)

I am not an expert on pedagogical methods. But I have been on the teacher side of university education for almost 20 years. And I’ve literally taught 100, 200, 300, 400, 500 and 600 level classes. I’ve taught classes ranging from 630 students to 3. Math-oriented to field-based. In short a pretty typical mid-career teaching history. And about 8 years ago, I took over a 600+ student intro bio class (basically BIO 100) and spent a lot of time thinking about goals which led to my introducing clickers which led to my basically being the lead academic (working with the campus learning center) leading clicker introduction in basic science classes across campus. And I was a TA in a class before and after introduction of active learning. (my most recent experience with changing pedagogy in a class is discussed below) So I’ve formed a few opinions along the way.

I am by no means at a settled state of where I think university education should go. But the following are a few thoughts and musings. (NB Meg has a series of good posts on this topic as well: here   here and Friday links here  and Terry has a bunch of good posts over at Small Pond here and here).

Point #1- Buzzword blur – we tend to just lump all the trends together but they are not the same. You can do one without the other. (And there are distinct goals and rationales in each case). Here is a quick tour

  • Active learning – activities in which the students are not just passively listening but actively producing knowledge via inquiry, answering questions, discussing, etc. This was one of the earliest movements (in ascendancy in late 90s).
  • Peer instruction – a model in which students teach each other. Often students are given a question and then discuss the answer with their peer students. This draws on research showing most people learn better in a social context. When tested via before & after versions of the same question using clickers I am astonished at the improvement (often 10% right to 95% right).
  • Flipped classroom – the buzzword du jour – this starts from the notion that lecturing is a relic from the days when textbooks were rare (hand copied). Flipping means students do passive learning (reading, watching lectures) at home on their own schedule, and then uses the classroom with the instructor present to do something more active where the instructor can intervene and assist. This can be as simple as having students do what used to be their homework now done in class and raise their hand for help to much newer approaches like peer instruction.
  • Just-in-Time-Teaching – the notion that the teacher will dynamically adapt the material being taught based on real-time feedback on what students are not understanding. This implies an ability to reteach material in a new way. It also implies real time feedback either from quizzes just before class or some in class feedback mechanism (clickers, hands raised) or although nobody talks about it old-fashioned sensitivity to puzzled looks on students faces.
  • Inquiry based learning/Investigative learning – instead of teaching material, giving students problems (specifically non-trivial problems) to solve. The teachers role is as a facilitator to help students discover first the process they need to use then the answer to the questions themselves.

Point #2 – Clickers – clickers are just a tool – they can be used for any of the above techniques or for purposes not listed above. At one end clickers can be used to pose simple multiple choice questions and then reward or penalize based on attendance (there is a difference and both are possible) Clickers can also be used in peer instruction (get clicker answers, show what everybody answered, discuss for 2 minutes with peers, then revote – amazing improvement occurs)  Clickers can also be an important tool in just-in-time-teaching if the teacher is flexible enough (i.e they’re a great way to find out if the students really understand what you just taught if you’re brave enough to deal with a no they didn’t answer). Generally one should only expect as much out of clickers as one puts into them. And clickers have real issues about cost – old fashioned methods like hand raising can do many of the same things (although its harder to force 100% participation). Honestly, I think the single biggest value of clickers is to serve as a disruptor and force you to think about how and why you teach. And if you don’t do that thinking, then clickers aren’t doing much.

Point #3 – Remembering why we are doing this – Although often not made explicit the goal of most of the techniques listed in Point 1 is to elevate learning up Bloom’s taxonomy. If this is not the goal, then such techniques are not necessarily the best approach. Bloom’s taxonomy was formulated in three domains: cognitive, emotional & physical, but the most talked about and the relevant one here is the cognitive. This recognizes the simple idea that there are different levels of learning starting with knowledge (memorize facts), then comprehension (extrapolate/understand), then analysis (using knowledge) then synthesis then evaluation. The last sentence is immensely oversimplified of course. But this is the central motivation of all of these techniques: to elevate learning up the taxonomy. Much of the origin of these techniques started in physics when people realized students were memorizing formulas enough to plug and chug on tests, but had major failures in basic intuition about how physics works. So they began teaching to develop higher level mastery.

Learning higher up on the taxonomy is obviously a good thing. But the thing I never hear anybody discuss is that it is part of an inherent trade-off. It is essentially a depth vs breadth trade-off. Any realistic application of active learning etc techniques to elevate learning involves covering less material. Covering better, but covering less. Are there times and places in university courses to cover the breadth rather than the depth? I struggled with this question a lot teaching intro bio. The breadth expected of that course from higher level courses, and indeed the breadth of life gives a strong demand in the breadth direction. But to cover it meant giving up on deeper understanding of higher level concepts like homoplasy. Which is more important: a) truly understanding homoplasy rather than just being able to regurgitate a definition of homoplasy (e.g. being able to produce new examples of homoplasy which would probably be the applying or 3rd level of Bloom’s taxonomy) or b) remembering platyhelminthes and their acoelemate architecture and basal position (level 1 or remembering)? Maybe some of you out there are such fantastic teachers you can achieve both in a semester. But in my experience this trade-off is very real (not on just these two exact topics of course but on these two levels of learning across all of the material to cover in an intro bio class). I never did fully decide what I thought about this and I’d be curious to hear what others say. But I do strongly believe there is a trade-off between breadth and depth (moving up the taxonomy).that is not talked about enough.

Point #4 – Notetaking – I find it ironic that in this day and age of focus on active learning and moving up the taxonomy, teachers have largely capitulated on giving students copies of powerpoint slides and eliminating a very effective method for doing real-time active learning while listening to lectures (with many studies showing that note taking is a very effective learning method). And nobody is calling this out.

Point #5 – You can go halfway (or 10%) in – It seems to me the conversation is very binary. All-in flipped/active learning/peer instruction 100% of the time or boring old traditional. This is totally bogus. If active learning has value, then one five minute exercise per hour (or even every other class) has value. And practically, it is very possible to choose anywhere on the spectrum from 0% to 100% flipped/active. This is also my reason for being pedantic and breaking apart the ideas in point #1. One can flip without inquiry based,  do active learning without just-in-time, etc.

Point #6 – This is not new – Another thing that is not discussed very often is that these techniques are hardly new (but see this link and commentary of Terry’s). Socrates was demanding productive/active learning using inquiry based techniques and peer instruction 2500 years ago. And many teachers have been doing the same for decades (and millenia).

Point #7 – How hard is it to do? – You can find various opinions about how much work it is to flip a class room (see Meg here and Terry here). My main experience was also the first time I taught the class so it is hard to separate the two. I don’t think I have an informed opinion. But I do think that for those of us raised in the traditional lecture mode, it can take more creativity and emotional energy to do something new and different.

Point #8 – Does it work? – My sense of the overall empirical literature on how effective these techniques is that the answer is complex, which matches my own experiences. There is a lot of evidence that active learning etc approaches match what we know from cognitive psychology about how we learn best, but this is indirect evidence for superior learning occurring. Students on average also enjoy these techniques. This is also indirect evidence (but very relevant in its own right). More directly, studies show statistically significant improvements in level of learning with active approaches but the pedagogical significance is tougher to assess. A good recent metanalysis is Freeman et al They show one half standard deviation improvement which amounts to about 6 points out of 100 improvement (less on traditional exams, more on higher level learning concept inventories).  But there are a lot of issues with these studies (e.g. are more motivated teachers more likely to adopt active learning techniques but succeed primarily because of the motivation not the method – or are they likely to teach better because the change in technique is forcing new energy and attention to teaching regardless of technique).

My own experience with a partial commitment to such techniques in the BIO 100 course is that the students scored exactly the same average (and I mean to 2 significant digits) on the final exam as they did in the earlier version of the course. It was a a rewritten exam, and I would like to argue that it was testing further up the taxonomy. But this was not formally measured. And it wasn’t miles up the taxonomy (it was still multiple choice for goodness sake). My overall impression is that there is an improvement in “learning” (hard as that is to define and measure) but it is not stunning or even by obvious amounts (i.e. I would have to use statistics to tease apart the improvements) .Its certainly not like every student is suddenly moving up a grade (e.g. B to A) in class or anything. Freeman suggests 4-5 points on traditional exams which might be a B- to a B. This still sounds a little high compared to the experiences I know of but not outrageously high. But I am more confident (based on experience and literature) that students are enjoying things more, paying attention more, and probably developing a slightly more sophisticated understanding. And that is nothing to sneeze at.

My most recent personal experience with pedagogy reform

This year I abandoned powerpoint (except for occasional graphs and pictures) and did a lot of chalk boarding but in the end you would have to say they were “traditional” lecture classes (in fact really old school lectures without visual aids except the chalk board). But the students took lots of notes (no powerpoints to give). And I spent a lot of time asking and being asked questions (there were <20 students so everybody was involved). Indeed, despite always making dialogue during class a top priority, a lot more happened this year – somehow powerpoint seems to introduce a wall and turns the goal into finishing the slides instead of teaching/learning. I did some peer instruction and also just giving ecological math problems to do individually in class, but most of it was more in the vein of Socratic inquiry (i.e. teacher asking a question and getting multiple responses back). So I wasn’t following too many of the buzzwords, but it felt like a much improved class to me. Was this good pedagogy or bad? NB: I am officially combining point #4 with this experience to launch a new pedagogical movement called “PowerPoint is evil”. If this takes off, you heard it here first! But then again, its possible that getting rid of the powerpoint was just the disruptor (as mentioned above with clickers) that made me pay more attention to my teaching and five years from now adding PowerPoint back in will improve my teaching. (UPDATE – just found this lovely piece on teaching with chalk by Chris Buddle)

Point #9 – Class size – Thinking about class sizes above raises another big point – one that I’m sure administrations won’t like. But how much can pedagogy innovation do to fundamentally change learning (or lack thereof) in a classroom of 600 (or 300 or even 100) students? Teaching a class of 15 students effectively is pretty easy.Teaching a class of 300 effectively is impossible no matter what. The aforementioned meta-analysis by Freeman showed pretty clearly that active learning is most effective in classes with <50 students and decreases in effectiveness pretty quickly in larger classes. Is pedagogy improvement just a giant distraction from the real issue?

Conclusions

Overall, I think the emphasis on pedagogical methods is fantastic (and largely unprecedented in higher ed – most previous reform movements have focused on curricular reform). And I do think there is something real and of value in the active learning movement.But its not ginormous. And I also think we have gotten overly simplistic, reducing teaching to a one-dimensional bad (traditional) vs good (100% active learning) axis. The reality is that even the concept of learning is multidimensional (with the Bloom taxonomy being but a single dimension) and that pro-con trade-offs exist on all of these dimensions. This makes it impossible to to say what the “best” teaching method is without specifying the learning goal. In practice, I think we are better off to think of the traditional vs active/flipped axis as a dial we should tune depending on the situation and goals. And this dial has positions everywhere in between 0 and 100. And it is not one-dimensional it has multiple dimensions including 0-100% flipped, 0-100% just-in-time, 0-100% peer instruction, 0-100% inquiry based learning independent of each other and etc. And, although I haven’t fully worked it out for myself, I believe in some contexts breadth is a more important goal than higher taxonomy learning. We don’t have a set of best practices for breadth-oriented learning yet, but I wish we did.

One big thing I hope comes out of all of this is that we spend a lot more time in our departments and among colleagues having discussions about what our learning goals are (and no I don’t mean the kind my university requires me to list on my syllabus under the heading goals that are just lists of topics covered). I mean talking about how far up the taxonomy should a class go. What breadth is necessary and appropriate in this class to set up future years. Which classes are appropriate for different kinds of learning. Perhaps ecology and genetics should focus on high level learning and BIO 100 should focus on memorizing the phyla of life? Or maybe not? How important is a 6 point increase on an exam (and maybe half of that in a large class)? Would we be better off scrapping exams and lectures and active learning and putting them in hands-on labs? or taking ecology students out in the field to design their own experiments? Recall that there are finite resources so there are trade-offs and limits. How can we measure and assess whether we are succeeding? We need to start having discussions about pedagogical goals in departments. Logically that should proceed decisions about classroom pedagogical methods, but I’m not sure this is how things have happened.

Bottom bottom line – Modern pedagogy (=active learning/flipped class/etc) is not a silver bullet and it should not become the good end of a one-dimensional value judgement (flipped=good, not flipped=bad teaching). But these techniques definitely have some benefits. There are probably other issues we should be talking about equally much ranging from the simple like the declining art of notetaking to the difficult like class sizes. And maybe just mixing up our teaching approach periodically is more important than any specific technique. More broadly we need to think deeply and discuss regularly about our pedagogical goals, especially depth vs breadth, and the best ways to get there.

What are your experiences with the modern pedagogy movement? Has flipping classrooms become a bandwagon? Is this a good thing or a bad thing? Is there a breadth vs depth (=up the taxonomy) tradeoff? Should we ever choose breadth? Which of the techniques in point #1 do you think are most important?

The secret recipe for successful working group meetings?

As Meg noted recently, science is increasingly involving working groups. This is the big science aspect I discussed a while back in a lengthy footnote (and distinct from big data). Although the original (at least in ecology) synthesis center at NCEAS is no longer funded by NSF (but still very much alive funded by conservation NGOs), there are three other synthesis centers in the US (NESCent, NIMBios, SESynC). a somewhat differently functioning synthesis center iPlant, and centers in Canada, Australia, France, Germany and many other countries (http://synthesis-consortium.org/). And I’m increasingly seeing work done in “working group format” even when it is not tied to a center. The NSF RCN  (Research Coordination Network grant program) is an example but quite a few PIs on regular NSF grants or NGO/conservation grants are also choosing the working group format.

I am a self confessed working group junkie. I take (undue?) pride in the fact that I’ve been to groups at all five US centers (and led working groups at two of them), been part of an RCN, been to meetings at Germany’s sDiv and although not an official synthesis center part of the UK’s Kavli meetings, and will be at Canada’s CIEE in May  and if funded at CESAB in France soon. That leaves Australia as the only big miss on my list (at least for an ecologist), and I did participate in an NGO-funded working group remotely in Australia as well.

Working groups are a personal preference. Some people like them more than others. And some people are better at being part of them than others too! There is no best way to do science. But I think they’re a great format for doing a number of things including – addressing both sides of a debate and building consensus, reviewing a field, doing meta-analysis or assembling and analyzing large datasets, and coalescing ideas and energy at key points in the trajectory of a field (including at its launch and at its coming down from bandwagon status). Certainly they have been influential – NCEAS is one of the most cited institutions in ecology.

But working groups are not a format people are trained to work in, let a lone lead. Our whole PhD is focused on primarily solo work with a few interactions. Most “regular” papers are 1-5 people. Then we throw people into a group with 15-20 people and social dynamics that are an order of magnitude more complex with no training. What follows is my distillation of the key success factors of working groups. They do not unfortunately, despite the title, come together into a magic recipe that guarantees success. And there are of course some variation depending on goals. But in my experience, if you get all of the following ingredients you’ve got a good shot at success.

During the working group proposal process

  1. Group composition #1 – personalities matter – Working groups are first and foremost social enterprises (I will be repeating this sentence several times). And with the competing challenges on everyone’s time and only having a week to pull things together, you are on the edge of failure right from the start. So it may be tempting to get the biggest name in the field, but if they’re a colossal ego who doesn’t play well with others avoid the temptation. One bad apple really can spoil the barrel. Indeed only invite people that you know either personally or indirectly through a colleague to be a good collaborator. Twice I’ve been part of groups where the goal was explicitly to bring in people from opposing camps – but even here considerable effort was expended to only bring people in who could be part of a civil give-and-take dialogue and some of the extremists were intentionally left out..
  2. Group composition #2 – career stages – In my experience the ideal working groups has  a pyramid shape with the largest group being postdocs, the next largest group being early tenure track, and a much smaller sample of senior ecologists. I’ve never actually seen a truly pyramidal group, maybe a more realistic goal is rectangular – with equal representation of postdocs, early career, and senior. But definitely think about this.
  3. Meet for 5 days per session – There is a wide variety of opinion on this. And I’ve been part of 2 day meetings that are successful. But if you’re going to fly in people form around the world who are giving up 2-3 days to travel and jet lag, why would you meet for less than 4-5 days? Also in my experience it really does take that long to allow some of the and social processes and buy-in to a common goal to take place. It may be efficient to have small subset groups that meet shorter periods (or extensions to the 5 days).  And if everybody already knows each other so the social processes and goals are well worked out, sometimes fewer days works. But in most cases 5 days is an optimal number in my experience. And if people can’t commit the 5 days, they’re not going to be a big contributor anyway. The working group process is a slow one. There are many other advantages, but speed is not one.
  4. Who will do the work between meetings? – This is one of the motivations for #2 – everybody will leave a group meeting with good intentions. But who will actually spend more than 5 hours moving the project forward (i.e. doing data, simulations, analysis, writings)? If the PIs of the working group aren’t going to do this (and if they aren’t prepared to do this they probably shouldn’t be the PIs) and there aren’t any postdocs looking for good projects then odds are nobody will do this. There are some exceptions I’ve seen, where say the goal was a meta-analysis and during the meeting everybody was assigned say 10 papers to code before the next meeting. This very discrete chunk can be expected between meetings. And I’ve seen plenty of meetings where somebody uplanned stepped up to carry a load (but they were almost always postdocs or occasionally early career).

Beginning of meeting

  1. Do a Powerpoint death march on the first day – This is my tongue-in-cheek name for the idea of lettting everybody at the group stand up and give a presentation about their work related to the topic. This is oft-debated with many arguing it is a waste of time. But in my experience if you don’t give people a clear window to get their opinion out, they will spend the whole rest of the meeting slipping it in edgewise. I have seen this happen more than once and it can be really annoying when the whole group is converging and somebody is knocking on about their preconceived idea of how to do it – better to get it out of the way on the first day. It is in the long run more efficient to spend a day doing this. That said, the PIs can make this efficient or painful. Give people very clear instructions on what you wan them to present on. And give them absolute time limits (typically 10 or 15 minutes). Then ENFORCE the time limits rigidly. Conversation about a presentation is fine to run over a little since conversation is the point of a working group. But DO NOT let anybody deliver a monologue one minute over their planned time. This only needs to be done the first time a group meets.
  2. Do a regrouping and group agenda setting after the Powerpoint death march – After everybody has been heard from spending some time setting the agenda for the rest of the time. Many times the PIs will have a pretty clear idea. Other times, the goal really is to brainstorm the agenda together. But either way put it on a white board and talk it out a bit as a group and be open to changes. This will get you buy-in and understanding of the agenda. It will also get you the sum-is-greater-than-the-parts synergy that you are hoping for from a working group.
  3. PIs need to take their role as cruise director seriously – Working groups are first and foremost social enterprises (I promised you that idea would come back). I’ve never seen a successful working group that didn’t spend a lot of time going out to dinners. The PIs need to take the lead to make sure that these are organized by early afternoon so everybody knows and they need to set the example that this is an expected activity. There is an age old debate amongst group members who want to go to dinner right after work stops and those who want a couple of hours to go exercise first. Some compromise is needed. Some of the best working groups I’ve been part of have actually knocked off early one afternoon and gone for a hike or field trip. It might seem a waste of time, but trust me it pays off
  4. Lead a discussion about authorship expectations early – There is no right or wrong answer about who should be a co-author on papers from the group. But setting expectations in a group discussion up front is essential. Most groups I’ve been part of have decided that everybody present should be part of the core synthesis or review paper(s). You want to create an attitude where everybody is chipping in and not holding back their best ideas. Authorship is the best way to do this. Authorship rules on more subisidiary papers varies, but it should be collectively agreed up front.

Middle part of the meeting (e.g. days 2-4)

  1. Do the work – this is of course the end goal. But its the hardest to give generic advice about because the nature of the work varies. It may be finding and coding papers for a meta-analysis or assembling data sets. It might be a fairly large group discussion about consensus state of the field. It might be simulations. It might be a mixture of these things. But it probably occupies the bulk of the meeting – especially the middle days. And it probably involves breaking out into subgroups with different tasks or topics to cover.
  2. Regroup once or twice a day – even if much of the work will happen in breakout groups (and it almost certainly will) – bring the whole group back for 30 minutes before lunch and 30 minutes before dinner and have each group report in. This keeps everybody rowing in the same direction. It is also where much of the magic of working groups happens as recurring themes and areas of disagreement emerge.
  3. Follow a diamond-trajectory – This is true really of any brainstorming or group process. The goal in the beginning is to broaden out – open up minds, create crazy ideas, capture every thought. Then when things have gotten impossibly wide, it is time to pivot and turn energies into focusing and narrowing down. A key to a good working group is for the PIs to have the nerve to let things broaden out for a while (often several days)  and then have the leadership to firmly reign it back into a focus.
  4. Know when to force a turning of the corner to writing – closely related to #11. In no case should you start writing immediately. And one or two people will probably do the bulk of the writing probably after you go home. But you should definitely start writing (or at least detailed outlining) before you scatter. You might even assign sections and end up writing a whole draft while you’re at the working group. But this is another key decision point for the leaders – when to stop the talking/analyzing and start the writing. It should start (again at a minimum to outline stage) before you leave.
  5. Pace yourself – it is tempting to treat the working group time as so precious that you should work 12 hour days. But this is a big mistake. Aside from the great importance of social bonding (#7), you are doing a creative activity that requires fresh bright minds. Many of your members will have flown 12-24 hours to get there and be jet lagged. And the rest will be exhuasted by an intense pace long before the week is over. I’ve personally found that keeping the working group to 9-5 with at least an hour for lunch (plus joint dinners that are social) keeps things productive through day 5 while anything more leads to severe drooping by the end.
  6. Manage the email and phone calls – everybody will want/need to keep up on email and may make an occasional phone call to their lab managers, other collaborations, etc. In my experience the best way is to tackle this head on by building in time for it and then putting out a pretty clear expectation to be fully focused on the meeting the rest of time. I usually allow 60 minutes for lunch (this is a social enterprise …) and then a good 30-45 minutes immediately after lunch for phone calls and catching up on email. This way people can run a little long on lunch or end a little early and have more time for email as they wish. And you can expect (and demand) full attention the rest of the time.

End of the meeting (e.g. Day 5)

  1. When the meeting really ends – If you tell people the meeting ends at noon, they will book flights out at 9. If you tell people the meeting ends at 5, they will book flights out at 12 or 1. So tell them it ends at 5 and secretly (don’t let on your real plan) know that you really will end at 1:00PM. But don’t forget that long distance travellers will usually not fly out until the next day. You can still get some work done, and have one last dinner. You just won’t have everybody. As a PI you should definitely plan to stay until the day after the meeting is officially over and lead this tail end.
  2. Leave with clear assignments  – well before people start peeling out – i.e the morning of the last day – put a list on the projector or white board of tasks, deadlines and 1-2 names attached (5 names attached is the same as no names attached). Discuss this with the whole group.
  3. Accountability – Find a way to keep work flowing between meetings. Emails with reminders of tasks is a good way to do this. Circulating draft versions of papers or working versions of datasets is a good way too. In my experience scheduling a monthly phone call is also a good idea. Having somebody setup to be a “nagger” (either a PI or a postdoc) who keeps track of timelines is important too.

So – being a good leader of a working group just requires staying on top of 17 different things! If it sounds like leading a working group is exhausting – it is! Being a participant at a working group is exhausting, but being a leader and riding herd on the whole process is a whole other level of exhausting.

Obviously my 17 points are not a magic formula. Its just the wisdom I’ve pieced together over a couple of dozen working group meetings. And a couple like #11 and #12 require serious judgement on the PIs part – all I can do is highlight the question. And some will disagree with my list – I know from discussions I’ve had #3 and #5 are definitely not universally agreed upon.

What are your experiences? What are the ingredients in your secret recipe to a successful working group? What works and doesn’t work?

In praise of slow science

Its a rush rush world out there. We expect to be able to talk (or text) anybody anytime anywhere. When we order something from half a continent away we expect it on our doorstep in a day or two. We’re even walking faster than we used to.

Science is no exception. The number of papers being published is still growing exponentially  at a rate of over 5% per year (i.e. doubling every 10 years or so). Statistics on growth in number of scientists are harder to come by – the last good analysis I can find is a book by Derek de Solla Price in 1963 (summarized here) – but it appears the doubling time of scientists, while also fast, is a bit longer than for the doubling time of the number of papers. This means the individual rate of publication (papers/year) is going up. Students these days are being pressured to have papers out as early as their second year*. Before anxiety sets in, it should be noted that very few students meet this expectation and it is probably more of a tactic to ensure publications are coming out in year 4 or so. But even that is a speed up from publishing a thesis in year 6 or so and then whipping them into shape for publication which seemed to be the norm when I was in grad school. I’ve already talked about the growing number of grant submissions.

Some of this is modern life. Some of this a fact of life of being in a competitive field (and there are almost no well paying, intellectually stimulating jobs that aren’t highly competitive).

But I fear we’re losing something. My best science has often been torturous with seemingly as many steps back as forward. My first take on what my results mean are often wrong and much less profound than my 3rd or 4th iteration. The first listed hypothesis of my NSF postdoc proposal turned out to be false (tested in 2003-2004). I think I’ve finally figured out what is going on 10 years later. My first two papers did not come out until the last year of my PhD (thankfully I did not have an adviser who believed in hurry up science). But both of them had been churning around for several years. In both cases I felt like my understanding and my message greatly improved with the extra time. The first of these evolved from a quick and dirty test of neutral theory to some very heavy thinking about what it means to do models and test theory in ecology. This caused the second paper (co-authored with Cathy Collins) to evolve from a single prediction to a many prediction paper. It also lead to a paper in its own right. And influenced my thinking to this day. And in a slightly different vein since it was an opinion paper, my most highly cited paper was the result of more than 6 months of intense (polite but literally 100s of emails) back and forth debate among the four authors that I have no doubt resulted in a much better paper.

I don’t think I’m alone in appreciating slow science. There is even a “slow science” manifesto although it doesn’t seem to have taken off. I won’t share the stories of colleagues without permission, but I have heard plenty of stories of a result that took 2-3 years to make sense of. And I’ve always admired the people who took that time and in my opinion they’ve almost always gotten much more important papers out of it. I don’t think its a coincidence that Ecological Monographs is cited more frequently than Ecology – the Ecological Monographs are often magnum opus type studies that come together over years. Darwin spent 20 years polishing and refining On the Origin of Species. Likewise, Newton developed and refined the ideas and presentation behind Principia for over a decade after the core insight came.

Hubbell’s highly influential neutral theory was first broached in 1986 but he then worked on the details in private for a decade and a half before publishing his 2001 book. Would his book have had such high impact if he hadn’t ruminated, explored, followed dead ends, followed unexpected avenues that panned out, combined math with data and literature and ecological intuition and generally done a thorough job? I highly doubt it.

I want to be clear that this argument for “slow science” is not a cover for procrastination nor the fear of writing or the fear of releasing one’s ideas into print (although I confess the latter influenced some of the delay in one of my first papers and probably had a role with Darwin too). Publication IS the sine qua non of scientific communication – its just a question of when something is ready to write-up. There are plenty (a majority) of times I collect data and run an analysis and I’m done. Its obvious what it means. Time to write it up! So not all science is or should be slow science. Nor is this really the same as the fact that sometimes challenges and delays happen along the way in executing the data collection (as Meg talked about yesterday).

But there are those other times, after the data is already collected, where there is this nagging sense that I’m on to something big but haven’t figured it out yet. Usually this is because I’ve gotten an unexpected result and there is an intuition that its not just noise or a bad experiment or a bad idea but a deeper signal of something important. Often there is a pattern in the data – just not what I expected. In the case of the aforementioned paper I’ve been working on for a decade, I got a negative correlation when I (and everybody else) expected a positive correlation (and the negative correlation was very consistent and indubitably statistically and biologically different from zero). Those are the times to slow down. And the goal is not procrastination nor fear. It is a recognition that truly big ideas are creative, and creative processes don’t run on schedules. They’re the classic examples of solutions that pop into your head while you’re taking a walk not even thinking about the problem. They’re also the answers that come when you try your 34th different analysis of the data. These can’t be scheduled. And these require slow science.

Of course one has to be career-conscious even when practicing slow science. My main recipe for that is to have lots of projects in the pipeline. When something needs slowing down, then you can put it on the back burner and spend time on something else. That way you’re still productive. You’re actually more productive because while you’re working on that simpler paper, your subconscious mind is turning away on the complicated slow one too.

What is your experience? Do you have a slow science story? Do you feel it took your work from average to great? Is there still room for slow science in this rush-rush world? or is this just a cop-out from publishing?


*I’m talking about the PhD schedule here. Obviously the Masters is a different schedule but the same general principle applies.