Play the Dynamic Ecology GeoGuessr Challenge!

In an old Friday linkfest I plugged GeoGuessr, which uses Google Street View images to simulate the experience of being dropped in a random location somewhere on Earth. Your challenge is to figure out where you are solely by driving around looking for visual clues. You get points based on how close your guess is. It’s dangerously good fun. If you’ve never tried it, don’t click the link if you’re planning to have a productive day. :-) And it helps to be a good field ecologist or botanist, because most of the world’s roads are in quite remote places where your only clues are the appearance of the landscape and the local vegetation.

I just discovered GeoSettr, which lets you create your own GeoGuessr challenges by choosing the locations into which players are dropped. Which inspired the first…

Dynamic Ecology GeoGuessr Challenge!

I’ve chosen five locations in GeoGuessr. But not just any locations. All of them are locations of famous field sites in ecology! (well, as close as I could get; field ecologists don’t usually work right by the side of the road) To take the challenge and see if you can figure out where they are, click this link. In the comments, tell us your score. Highest score wins, though if it’s very close I’ll probably ask you to name the field sites as a tiebreaker (so leave a valid email address if you want me to be able to do that) I’ll buy the winner a drink at the ESA meeting.

Deadline for entries is one week from now. Good luck and have fun!

p.s. No cheating! You’re only allowed to use the information provided by GeoGuessr, plus your own knowledge. No googling words you see on road signs or anything like that.

2-3 ecology grad student positions in Jeremy Fox’s lab

Through a mixture of circumstances, I suddenly have 2-3 grad student positions open in my lab for fall 2014 or winter 2015. So while I know this is very much not the season for this sort of thing, if you’re looking to do an M.Sc. or Ph.D. asking fundamental questions in population, community, or evolutionary ecology, please drop me a line (

For background on my lab, visit my lab website. Briefly, my own work mostly involves modeling and experiments on population and community dynamics using laboratory-based microbial model systems. But most of my students have worked in other systems, including alpine plants, plant-pollinator interactions, and bean beetles. Basically, I’m happy for my students to work in any system as long as I and my Calgary colleagues have the financial and intellectual resources needed to support the work. Some examples of the questions I and my students have been working on recently:

  • Quantifying local adaptation of lake bacteria to spatial and temporal variation in water chemistry. This is a really neat project in which we’re freezing bacteria and water samples at -80 C so that we can do reciprocal transplants among sites and times to test for local adaptation in the field. Gets at fundamental questions about the relative importance of spatial vs. temporal environmental variation for the maintenance of genetic diversity. A paper based on pilot work is in review right now, the next step is to scale things up in a big way.
  • Eco-evolutionary dynamics of competing bean beetles. My student Stephen Hausch (co-supervised by my colleague Steve Vamosi) has been doing some really ambitious experiments with bean beetles to look at how standing genetic variation affects the outcome of interspecific competition in both the short and long term. He’s got some amazing results, some of which we understand, some of which we don’t (e.g., the beetles don’t seem to evolve character displacement…) Lots of opportunities to build on this work.
  • Interplay of drift and determinism in community dynamics. A line of research I got into because I found that my protist microcosms often exhibit more among-replicate variability in their dynamics than can be accounted for by sampling error. Even in highly-controlled systems with large population sizes in a constant environment, stochasticity seems to play an important role in population and community dynamics. We’re doing experiments to try to nail that down, for instance by manipulating culture vessel size, thereby directly manipulating total population size and thus the strength of drift. One goal is to understand the circumstances under which the deterministic component of the community dynamics amplifies or damps out stochasticity.
  • Spatial synchrony of population dynamics. A long-running collaborative project with my former postdoc Dave Vasseur, now at Yale. Mixes mathematical modeling and protist microcosm experiments. Various directions one could go with this, including branching out into other systems. I have what I think are some neat ideas to use flour beetles as a model system to study spatial synchrony of chaotic population dynamics.
  • Other projects include facilitation, coexistence, and plant species distributions along elevation gradients, and various other developing ideas. And I’m also open to students (especially Ph.D. students) who have their own ideas.

The Department of Biological Sciences at the University of Calgary has a strong group of about a dozen ecologists and evolutionary biologists, with strength in depth in evolutionary ecology, population ecology, plant-insect interactions, fish ecology, and other areas. The department has two field stations in the mountains, next-generation sequencing facilities, access to various high-performance computing clusters, and everything else you’d expect from a big, well-equipped research university.

Grad students in the department are guaranteed a minimum of $21,000/year through a mixture of TAships, RAships, and other sources like fellowships.

Calgary is a city of over 1 million people, 45 minutes drive from the Canadian Rockies with all the opportunities for field work and recreation that implies.

If you’re interested, please email me ASAP. Tell me a bit about your background, interests, and long-term goals, and about what specifically attracts you to my lab and/or Calgary more broadly. Please also include a cv, undergraduate and any graduate transcripts (unofficial is fine), and contact details for three references.

Friday links: women in science x3, Peter Godfrey-Smith at 3 am, Dr. Zen on postpublication review, and more

Also this week: Meg has started reading the internet again, happy 50th birthday kin selection theory, the changing nature of academic celebrity, pretty pictures of Jeremy’s “backyard”. Also a link specifically for longtime reader Jim Bouldin!

From Meg:

Here’s an interesting infographic showing where people who start biology PhD programs end up (hint: mostly not in tenure track positions).

I feel like maybe I’ve linked to this in the past, but I can’t remember for sure. Either way, I enjoyed this take on being a mom in academia. The author talks about her strategies for balancing work as an academic with having three young children. I found the last part of the post particularly interesting:

So there you have it. A confluence of luck, good choices, hard work, and support have meant that – whisper it - its not terribly stressful to be an academic working mother, for me. It would be much, much harder work to stay at home looking after 3 small boys day in, day out. I’ve done it. Believe me.

I don’t like being called superwoman. It suggests I’m heading for a fall, in lots of ways. So how about this, I’ll let you call me superwoman if I maintain my academic trajectory and my boys all make it to a happy, healthy adulthood, and are fulfilled and settled in their own ways (whatever that may turn out to be). Then you can call me superwoman. But for now, I’m just a woman who happens to have a larger-than-usual young family and a job that I really enjoy (and how lucky am I, in both counts?). There are lots of us around, all doing our best: it can be done without fanfare.

You’ll also spot that I havent mentioned “work-life balance”.  I dont believe in it. There are only 24 hours in a day, and its all my life. My work is my life and my home is my life and my family is my life and my addiction to mid-century Belgian ceramics on eBay is my life. Going to the British Museum for a work meeting is as much my life as scraping squashed peas off the floor from under the dining room table, or cranking out a book chapter, or leading a sing-song of She’ll Be Coming Round the Mountain, or looking at the UX of an iPhone App, is. Life is full, round, packed, joyous, tiring, exhilarating, exhausting, fast, fun, and being lived. I love my family. I love my job. And this is how I do it.

Here’s an interesting, interactive infographic on the percentage of women authors in different academic fields through time. For the 1991-2010 time period, ecology and evolution papers had 22.8% female authors; if you click through, you’ll be able to look at the percentages in different subdisciplines.

And keeping with my women in science theme, this article reports the results of a study (from a few years ago) on differences in the amount of departmental service men and women perform. I found this part particularly interesting:

One duty studied was serving as director of an academic department’s undergraduate program. Of associate professors, one third of women but only 17 percent of men had served as undergraduate directors. “Because undergraduate directors spent more time teaching and working with undergraduates — tasks that research universities tend to undervalue — gendered norms may contribute to women associate professors spending more time in devalued roles,” the paper says.

In fact, the study finds that women associate professors who served as undergraduate directors took, on average, 12 years (rather than the typical 7) after receiving tenure to be promoted to full professor. Male associate professors who served as undergraduate directors moved to full professor at the normal pace. While the study acknowledges that a range of factors beyond serving as undergraduate director may be involved, it notes the significant gap in years.

From Jeremy:

3:AM magazine has been doing a long series of interviews with philosophers. Here’s an interview with Peter Godfrey-Smith, a top philosopher of biology. I’ve read some of his work, it’s really good, very much at the interface of biology and philosophy and motivated by the same questions practicing evolutionary biologists ask. Indeed, he’s actually published papers in biology journals as well as in philosophy journals (e.g., this). The interview touches on things like why he became a philosopher, his views on evolutionary theory, and why biologists should pay attention to philosophers.

Zen Faulkes (“Dr. Zen”, a longtime blogger at NeuroDojo) has a nice paper in Neuron on postpublication review. Good overview of the current state of play, and I agree with most of it. Makes some points that aren’t made often enough, like the fact that much postpublication review actually is positive rather than critical. Interesting suggestion at the end that we don’t really need a centralized venue or formal system for postpublication review. Instead, we should just view online postpublication review in its various forms (blog posts, tweets, Facebook comments, whatever) as more like the informal exchanges at scientific conferences rather than part of the formal scientific record. I can appreciate the motivation for that, but I don’t buy it. It’s true that blogs, tweets, etc. are importantly unlike peer reviewed papers and so shouldn’t be treated as such (and it’s to push back against that mistake that Zen suggests the alternative analogy to scientific conferences). But they’re also importantly unlike scientific conferences, and so shouldn’t be treated as such.

Kin selection theory is 50 years old. Here’s a good article recounting its history and marking its continued relevance. (ht Ed Yong)

A nice post on what mechanistic models are, their contrasts to standard sorts of statistical models (including so-called “causal” models like structural equation models), how mechanistic models often have non-mechanistic bits, and more. Uses an ecological example (SIR models).

Anecdotal but interesting article on the changing nature of academic celebrity. In particular, how new ways of becoming a well-known public intellectual, such as TED talks and to a lesser extent blogs, seem to favor certain fields over others. And also favor certain sorts of people over others. Includes comments from evolutionary biologist Sara Lewis, one of the few ecologists or evolutionary biologists to become famous via TED talks. (Aside: I hope Sara’s right that TED can be a vehicle for shifting the public’s views on controversial issues like evolution vs. creationism, but I highly doubt it) Of course, previous ways of becoming a public intellectual also favored certain fields, and certain sorts of people, over others–just different sorts of fields and people than are favored by blogs and TED talks. I wonder if the same changing dynamic of fame is happening on a smaller scale within fields; I suspect it is.

An experimental study of a psychological mechanism by which zombie ideas persist.

My Calgary colleague Lee Jackson is a really good nature photographer (one of several in my department). The link goes to a gallery of his best shots, many of which depict our local mountains.

This is old but I missed it at the time: how the US Forest Service helped reduce the rate at which maple baseball bats shatter. Don’t say I never did anything for you, Jim. :-)

And finally: it’s final exam season! So here’s a music video that gives voice to what your undergraduate students are probably thinking right now. (NSFW) :-) (ht Frances Woolley, via Twitter)

When a series of entirely reasonable decisions leads to biased outcomes: thoughts on the Waterman Award

The National Science Foundation just announced the winner of the 2014 Alan T. Waterman Award, the highest award it gives to a scientist or engineer under the age of 35. The winner is Feng Zhang, a molecular biologist at the Broad Institute and Harvard. In addition to being a huge honor, the award comes with $1 million dollars of research funding. It’s a big deal. And, for that reason, I was concerned to see that, just like the previous 10 winners, this year’s winner was a man.

Now, I want to be clear: Feng Zhang is clearly a very impressive scientist, and is highly deserving of this award. So is each man who won the award in the previous decade. But when male scientists win an award 10 times in a row (in one year, two men won), I would suggest that argues that it’s worth examining the process for unintended biases.

Why this focus on the Waterman Award? Because, in my opinion, this is a good example of a common phenomenon that happens often in academia, including in ecology. It happens with society (another link) and university awards, faculty searches, invitations to speak at meetings, and in departmental seminar series – a committee of well-meaning people who are not trying to be exclusive end up selecting primarily men.* In each case, there’s a pool of people who might be deserving of the award (or position or seminar/talk slot); the concern is when the people chosen from that pool end up being a biased sample.

As I said, even in cases where each individual decision or outcome seems entirely justifiable, if there is a consistent pattern, the process as a whole needs to be examined. Let’s split the process into two halves to consider:

1. The nomination process. It is really, really common for people to initially think of men when asked to nominate people, either for awards or talks. This is part of why things like Anne’s List of women neuroscientists is valuable. People interested in increasing diversity of a seminar series, for example, could skim such a list to look for people to consider inviting. As far as I know, no such list exists for ecology and evolution, though I know that my Michigan colleague Gina Baucom has been doing some behind-the-scenes work to try to get one going.

For something like the Waterman Award, this is more challenging. I would guess that a lack of a diverse pool of nominees is probably a large part of the problem. Given how many of the winners have also won the Presidential Early Career Award for Scientists and Engineers (PECASE), one option would be to write to the departmental chairs of all PECASE winners to suggest they submit a nomination. The program contact for the Waterman Award, Mayra Montrose, already writes to PECASE winners to suggest that they ask to be nominated for the Waterman Award – they are clearly trying to increase the number of nominations (which is good!) I have no idea what percentage of PECASE recipients ask someone else to be nominate them, but I’m guessing that, given things like imposter syndrome, women and underrepresented minorities might be less likely to follow through on that suggestion. (This piece in The Atlantic also suggests this would be the case.) Asking the chairs directly might help overcome this. If you have other ideas for how to increase the diversity of the pool of nominees for this sort of award, please suggest them in the comments!

2. The selection process. Everyone has implicit biases, and this affects how we evaluate women and people of color. As the title of this Nature correspondence piece by Marlene Zuk and Gunilla Rosenqvist puts it, “Evaluation bias hits women who aren’t twice as good”. So, one key step is (after soliciting a diverse pool of nominees) to try to evaluate nominations in a way that reduces bias (e.g., by being aware of biases and by having a set of specific criteria – agreed upon prior to reviewing any of the nominations – on which nominees are evaluated). Another important step is to acknowledge that there is a history of implicit biases – letters written will tend to be shorter and less glowing (pdf link), grants tend to be harder to get, etc. (Here is a pdf from the University of Michigan STRIDE Committee on best practices related to faculty awards.)

For the Waterman Award, coming up with a single, specific set of criteria seems like it could be particularly challenging. The award goes to one person, but that person could be in any of a wide range of fields, so the awards committee presumably needs to find a way to compare a sociologist, an astronomer, a cell biologist, and an engineer. Each field will have differences in terms of productivity – I have no idea how to try to compare the publication records of a pure mathematician and a computer scientist, for example.

Another potential challenge with the Waterman Award is that, because it goes to people under the age of 35, it will be evaluating people on their productivity at an age right around where many women have children. (This article talks about the “baby penalty” faced by women in science.) Again, I don’t have good ideas for how deal with this for this sort of award (other than for the selection panel to keep it in mind), but hopefully others will.

Hopefully it is clear that I don’t think there is an easy solution to the male-bias in Waterman Award winners, and I am most definitely not saying that there is any intentional bias going on. But I am saying that having 11 consecutive male winners suggests that there might be biases, and that possible ways to improve the diversity of awardees should be considered. My goal here is to try to start a conversation. Moreover, my hope is that thinking about the biases that might influence this one particular award will hopefully lead us to also examine biases in society awards, hiring, and seminar/talk invitations, which affects most of our readers more directly.

Does the diversity of award winners matter? I think so. In the case of the Waterman Award, there is a lot of prestige (and money!) that come with having won it. For society awards, there usually isn’t much money, but there is prestige. Plus, for younger scientists just starting out, seeing pictures of winners (of the Waterman Award or a society award) and seeing that very few look like them sends the message that maybe they don’t belong. Women and people of color in the sciences often ask “Should I be here?” We want to make sure that our seminar series, our faculty, and our award winners send the message that they do.

If you have thoughts on how to deal with this problem – either for the Waterman Award in particular or for the more general problem – please share them in the comments!

*This pattern tends to be even more extreme for racial and ethnic minorities, of course.

Other posts on related topics:
1. Supporting other women in science (from Tenure, She Wrote, by scitrigrrl)
2. Creating a diverse speaking series (from Jabberwocky Ecology, by Morgan Ernest)
3. Best practices in faculty hiring (University of Oregon)

Our least-read posts

Just for fun, I just looked up our least-read posts. Some of what I found was unsurprising, but other things were kind of interesting. Any of you who are thinking of starting your own blogs might find some useful tips here as to what sort of posts people don’t want to read.

  • Unsurprisingly, announcements about the blog (e.g., “Meg’s away for a while”) don’t draw readers. Which is fine, sometimes you just need to say things for the record.
  • Our Friday linkfests drew very few readers early on. They’ve become more popular, which has convinced us to keep doing them. Plus, I find that doing them is a good source of new post ideas. But they’re still easily the least-read sort of post that we do regularly. I’m not surprised. People mostly share links via social media these days. And apparently, our brief commentary on our links doesn’t add enough value to draw many readers.
  • But even less-read than Friday linkfests are short posts sharing a single link. Even if they’re pretty cool links to old stuff like this. That’s actually one reason we do the Friday linkfest posts–the alternative is to do lots of short, single-link posts that nobody would read.
  • Joke posts like this one mostly get ignored. Clearly, our readers have no sense of humor. :-) This is why I’ve mostly stopped doing joke posts.
  • The job ads that we occasionally post for ourselves and for friends don’t draw many readers, unsurprisingly.
  • Posts that mostly consist of an interesting or provocative blockquote mostly don’t get read. Not even if the person quoted is Charles frickin’ Elton.
  • Somewhat to my disappointment, posts where we just pose a question to readers, or ask readers a poll question, draw few readers. If we just toss out a question and “open the floor” for discussion, we usually get next to no response. The partial exceptions seem to be our “ask us anything” posts, and posts that pose really fun questions.
  • Meeting previews and recaps mostly draw few readers, unless we write them in such a way as to effectively turn them into regular posts (and sometimes not even then). I know there are some readers who really appreciate these posts as a way to attend scientific meetings vicariously. But I’m sorry, there just aren’t that many of you. I’ve been dialing back on the effort I put into doing ESA preview and review posts, and I’ll probably dial back further in future.

The general principle seems to be that readers want to read what we think, preferably at length. They don’t want us to just pose questions for them to think about, or write little mini-posts, or toss out links. And they only want jokes about bird poop. :-)

More interesting are the substantive posts that were little read, but for no obvious reason that I can see. Here, in no particular order, are our least-read substantive posts. Note that the list includes a couple of posts I think are quite good, which just shows how much I know. :-) Note also that all of them are by me; Meg and Brian are nowhere to be seen on this list. Make of that what you will. :-)

Getting over Robert MacArthur. Sparked a good discussion, but in retrospect it’s a bad post. I did a poor job of articulating what was bugging me. So good on y’all for not bothering to read it.

Against live-tweeting talks. A brief post linking to and agreeing with something someone else wrote. It drew very few readers, which is probably for the best as it’s a lousy post. I was mostly wrong, as commenters and some other bloggers politely pointed out. I was also unclear, which caused some people to misread me. And given that the post touched on a hot-button issue for some readers (a fact I suspected when I wrote it), it really needed to be above average rather than below average.

Yes, the IDH is a zombie: a response to Karl Cottenie. A good, productive exchange with Karl Cottenie on the IDH, but hardly anyone read it, despite the fact that the whole “zombie ideas” meme is one of our most popular memes. Maybe this post just seemed too much like a private, technical conversation?

Is macroecology like astronomy? I think this is one of my best efforts and it sparked a really good discussion. Heck, I even think the title is pretty good! But hardly anybody read it. Go figure.

Take-home messages vs. the devil in the details. Argues that we all (including me) skim in many situations when we ought to be reading carefully. But you wouldn’t know that because you didn’t read it. :-)

Thoughts on NutNet. I think NutNet is one of the great ecology experiments of the past decade, and we should all be racking our brains for other questions ripe for attack with the same approach. If you have no idea what I’m talking about, you really should click through.

Why are some ecological ideas controversial? Ok, this one got few readers because I posted it on New Year’s Day. #amateurhour

p.s. Yes, I am aware of the irony that this post isn’t likely to get many readers. :-)

Is requiring replication statistical machismo?

A recent post of mine about why Biosphere 2 was a success stirred mixed reactions. But one of the most common negative reactions was that there was no replication in Biosphere 2, which of course EVERYBODY knows is a hallmark of good science. This actually spilled into a spirited discussion in the comments. So, do we need replication to do good science?

Anybody who has read some of my older posts (e.g. one true route poststatistical machismo post) will know that my answer is going to be no. I’m not going to tell a heliologist that they are doing bad science because they only have one sun (they do have the stars, but most of the phenomena they study like sun spots are not yet studyable on other stars). Nor am I going to say that to people who have developed theories about why our inner solar system contains rock planets and the outer solar system contains giant gaseous planets (although in the last 2-3 years we are actually getting to the point where we have data on other solar systems, these theories were all developed and accepted well before then). And Feynman’s televised proof that a bad interaction between cold weather and a rubber O-ring led to the demise of the Space Shuttle Challenger definitely did not need and would not tolerate replication. Closer to home, I am not going to tell people who have been measuring CO2 on top of Mauna Kea  (aka the Keeling Curve one of the most well known graphs in popular science today) that their science is bad because they only have one replicate. Nor am I going to tell people who study global carbon cycling to give up and go home because it is a well mixed gas on only one planet (I mean come on  N=1, why waste our time!?). In short , no, good science does not REQUIRE replication.

Let me just state up front that replication IS good. The more replication the better. It always makes our inferences stronger. We DO need replication when it is feasible. The only problem is that replication is not always possible (sometimes even with infinite amounts of money and sometimes only due to real world time and money constraints). So the question of this post is NOT “do we need replication?” It IS”do we HAVE to have replication” and “what do you do in these trade-off or limitation situations?” Give up and go home – don’t study those questions – seems to be some people’s answers. Its not mine. Indeed any philosophy of science position which leads to the idea that we should stop studying questions that inconveniently fail to fit a one-stop-shopping approach to science is not something I will endorse. This is the statistical machismo I have talked about before – when one has to make the statistics so beautiful AND difficult that few can achieve the standard you have set and you can then reject others work as WRONG, WRONG, WRONG. Careful thinking (and perusing the examples in the last paragraph) lead to a number of ways to do good, rigorous science without replication.

First let’s step back and define what replication is and why it is important. Wikipedia has several entries on replication, which in itself is probably informative about the source of some of the confusion. When ecologists think about replication they are usually thinking about it in the context of statistics (wikipedia entry on statistical replication) and pretty quickly think of Hurlbert’s pseudoreplication (also see Meg’s post on the paper) . This is an important context, and it is pretty much the one that is being violated in the examples above. But this definition is only saying you need replication to have good statistics (which is not the same as good science). But Wikipedia has an alternative entry on “replication – scientific method” which redirects to “reproduceability”. This definition is the sine qua non of good science, the difference between science and pseudoscience. Reproduceability means if you report a result, somebody else can replicate your work and get the same thing. If somebody is doing science without reproduceability, call them out for bad science. But don’t confuse it with replication for statistics. Ecologists do confuse these two all the time. Thus to an ecologist replication means multiple experimental units well separated in space (not well separated=pseudoreplication, not multiple=no replication=degrees of freedom too small). As I said, those are both good goals (which I teach in my stats class and push students to achieve). But they are not the sine qua non of good science.

It is instructive to think about an example that came up in the comments on the Biosphere 2 post: the LHC (large hadron collider) and the hunt for the Higg’s Boson. Pretty blatantly they did not have ecological replication. Each LHC facility costs billions of dollars and they only had one (ditto for Biosphere 2). But the physicists actually had an extremely well worked out notion of rigorous reproduceability. Despite only having one experimental unit, they did have multiple measurements (observed particle collisions). Thus this is a repeated measures scenario, but notice that since there was only one “subject” there was no way to correct for the repeated measure. The physicists made the assumption that despite being done on one experimental unit, the measures were independent. But what I find fascinating is that the physicists had two teams working on the project that were “blnded” to each others work (even forbidden to talk about work with each other) to tackle the “researcher degrees of freedom” problem that Jeremy has talked about. They also had very rigorous a priori standards of 5σ (p<0.0000003) to announce a new particle (I seem to recall that at 3σ they could talk about results being “consistent with” but not “proof of” but I haven’t found a good reference to this). So, in summary, the Higg’s test had an interesting mix of statistical replication (5σ), reproduceability (two separate teams) and pseudoreplication (uncorrected repeated measures) from an ecologist’s perspective.

So what do we get out of statistical replication? The biggest thing is it allows us to estimate σ2 (the amount of variance). We might want to do this because variance is innately interesting. For instance, rather than ask does density dependence exist, I would rather ask what percent of the year-to-year variance is explained by density dependence (as I did in chapter 8 of this book and as I argued one should do in this post on measures of prediction). Or we might want to quantify σ2 because it lets us calculate a p-value, but this is pretty slippery and even circular – our p-value gets better and better as we have more replication (even though our effect size and variance explained don’t change at all). This higher p-value due to more replication is often treated as equal good science, but that is poppycock. Although there are valid reasons to want a p-value (see Higg’s Boson), pursuit of p-value quickly becomes a bad reason for replication. Thus for me, arguing to have replication to estimate σ2 is a decidedly mixed bag – sometimes a good thing, sometimes a bad thing depending on the goal.

However, and to me this is the biggest message in Hurlbert’s paper but often forgotten against the power of the word “pseudoreplicationn”, is the #1 problem driving everything else in the paper is the issue of confoundment. If you only have one site (or two or three), you really have to worry about whether you get the effect you observed because of peculiarities of that that site and any weird covariances between your variable of interest and hidden variables (Hurlbert’s demonic intrusions). Did you get more yield because of pest removal as you think or because its downhill and the soil is wetter? One way to kill the demon of confoundment is to have 100 totally independent, randomly chosen sites. But this is expensive. And its just not true that it is the ONLY way to kill the demon. I don’t think anybody would accuse the LHC of confoundment despite only having one site. You could spin a story about how the 23rd magnet is wonky and that imparts a mild side velocity (or spin or I don’t know my particle physics well enough to be credible here …) that fools everybody into thinking they saw a Higg’s boson. But I don’t hear anybody making that argument. The collisions are treated as independent and unconfounded. The key here is there is no way to measure that or statistically prove that. It is just an argument made between scientists that depends on good judgement, and so far the whole world seems to have accepted the argument. It turns out that is a perfectly good alternative to 100′s of spatial replicates.

Let me unpack all of these examples and be more explicit about alternatives to replication as ecologists think about it – far separated experimental units (again these alternatives are only to be used when necessary because replication is too expensive or impossible but that occurs more often in ecology than we admit):

  1. Replication in time - repeated measures on one or a few subjects do give lots of measures and estimates of σ2 - its just that the estimate can be erroneously low (dividing by too many degrees of freedom) if the repeated measures are not independent. But what if they are independent? Then its a perfectly valid estimate. And there is no way to prove independence (when you have only one experimental unit to begin with). This is a matter for mature scientists to discuss and use judgement on as with the LHC – not a domain for unthinking slogans about “its pseudoreplicated”. Additionally there are well-known experimental designs designs that deal with this, specifically the BACI or before/after/compare (just Google BACI experimental design). Basically one makes repeated measures before a treatment to quantify innate variability, then repeated measures after the treatment to further quantify innate variability and then compares the before and after difference in means vs. the innate variability. The Experimental Lakes Area eutrophication experiments are great examples of important BACI designs in ecology and nobody has ever argued those were inconclusive.
  2. Attention to covariates - if you can only work at two sites (one treatment and one control) you can still do a lot of work to rule out confoundment. Specifically you can measure the covariates that you think could be confounding. Moisture, temperature, soils, etc and show that they’re the same or go in the opposite direction of the effect observed (and before that you can pick two sites that are as identical as possible on these axes).
  3. Precise measurements of the dependent variable - what if σ2=0? Then you don’t really need a bunch of measurements. This is far from most ecology, but it comes up sometimes in ecophysiology. For a specific individual animal under very specific conditions (resting, postprandial), metabolic rate can be measured fairly precisely and repeatably. And we know this already from dozens of replicated trials on other species. So do we need a lot of measurements the next time? A closely related one is when σ2>0, but the amount of error are very well measured and we can do error analysis that ripples all the error bars through the calculations. Engineers use this approach a lot.
  4. We don’t care about σ2 - what if we’re trying to estimating the global NPP. We may have grossly inaccurate measurement methods and our error bars are huge. But since we have only one planet, we can’t do replication and estimate σ2, but does that mean we should not try and estimate the mean? This is a really important number, should we give up? (note – sometimes the error analyses mentioned in #3 can be used to put confidence intervals on, but they have a lot of limitations in ecology). And note I’m not saying having no confidence intervals is good, I’m saying dropping entire important questions because we can’t easily get confidence intervals is bad.
  5. Replication on a critical component - The space shuttle example is a good example of this. One would not want to replicate on space shuttle’s (even if human lives were taken out of the equation cost alone is prohibitive). But individual components could be studied through some combination of replication and precise measurement (#3 above). The temperature properties of the O-ring were well known and engineers tried desperately to cancel the trip. They didn’t need replicate measures at low temperatures on the whole shuttle. Sometimes components of a system can be worked on in isolation with replication but still generalize to the whole system where replication is not possible.
  6. Replication over the community of scientists - what if you have a really important question that is at really big scales so that you can only afford one control and one experimental unit, but if it pans out you think it could launch a whole line of research leading to confirmation by others in the future? Should you just skip it until you convince a granting agency to cough up 10x as much money with no pilot data? We all know that is not how the world works. This is essentially the question Jeff Ollerton asked in the comments section of the Biosphere 2 post.

So, in conclusion: Ecologists have an overly narrow definition of what replication is and what its role in good science is. High numbers of experimental units spatially separated is great when you can do it. But when you can’t, there are lots of other things you can do to deal with the underlying reasons for replication (estimating σ2 and confoundment). And they are not places for glib one-word (“pseudoreplication” sneeringly said) dismissals. They are places for complex, nuanced discussions about the costs of replication and how convincingly the package of alternatives (#1-#6) are deployed, and sometimes even how important the question is.

What do you think? Have you done work that you were told is unreplicated? How did you respond? Where do you think theory fits into the need for replication – do we need less replication when you have better theory? Just don’t tell me you have to have replication because its the only way to do science!


Friday links: Stein’s paradox, non-significant is the new significant, and more

Also this week: why modeling the process that generated the data is the least of your worries, pretty pictures of mammals, and more…

From Jeremy:

This will blow your mind (unless you’ve heard of it before). Say that you have sample data on three or more variables. They could be anything, they need not be related in any way. You want to estimate the true (population) means of these variables. Your best estimate is the vector of their sample means, right? Wrong. Better estimates can be obtained by shrinking the mean of each variable towards the grand mean of all of them. This is Stein’s Paradox. It’s a famous result in statistics, dating from 1955. It’s totally counterintuitive–until it’s explained properly, and then it makes total sense. And once you get it, you’ll have a much deeper understanding of everything from nonparametric smoothing to empirical Bayes methods. Check out this wonderful, totally non-technical paper on Stein’s Paradox from Brad Efron and Carl Morris. You’ll be glad you did.

Deborah Mayo argues that the replication movement show signs of becoming a bandwagon. As she puts it, “non-significant is the new significant.” A quote to give you the flavor:

The authors seem to think that failing to replicate studies restores credibility, and is indicative of taking a hard-nosed line, getting beyond the questionable significant results that have come in for such a drubbing. It does not. You can do just as questionable a job finding no effect as finding one. What they need to do is offer a stringent critique of the other (and their own) studies. A negative result is not a stringent critique.

Papers that were publicly accused of reporting fraudulent data on the now-defunct website Science Fraud were seven times more likely to be retracted than otherwise-similar papers for which the accusations were privately reported to the relevant authorities. There are various possible explanations, but it sure does look like journal editors often move only in response to negative publicity. Which is unfortunate for various reasons, including the fact that it encourages people to immediately go public with accusations of fraud–accusations that don’t always stand up to scrutiny. But as one of the folks quoted in the linked piece notes, journals need to recognize that a significant fraction of people no longer trust them, and won’t be satisfied by stonewalling.

Via the BBC, a pretty good popular treatment of how human cultural practices have influenced our genetic evolution. Includes discussion of some of the best-studied examples. Good source of examples for undergraduate classes.

Rookie mistakes in statistical analyses of empirical data. Or, “why correctly modeling the data-generating process is the last thing you should be worried about.” From economics, but it applies to ecology too. (ht Worthwhile Canadian Initiative)

And finally, the winning shots from the Mammal Society’s Mammal Photographer of the Year competition, which focuses on British mammals. Dolphin 1, salmon 0. And THE BROWN HARE IS WATCHING YOU. :-)

Policy relevant science: the unreasonable effectiveness of boundary objects

In a recent post on policy-relevant science I talked about boundary workers and boundary organizations. The boundary I am talking about is between science and policy and the notion of the boundary between scientists and policy-makers is something receiving increasing attention by social scientists. Briefly (read the last post if you want more), the idea originated with people who span between inside and outside of a company, mutated to be the boundary between scientists and others, and led to adding a new concept – boundary organizations (e.g. conservation NGOs, land trusts, etc).

But today, I want to talk about another idea that emerges from thinking about the boundary: the boundary object. As the name implies a boundary object is a thing, not a person or group of people, that helps to span the boundary. In the original corporate model, things like price lists and user licenses were boundary objects. In the science policy boundary, there are many possibilities – maps and models being two of the most commonly cited, but many many other objects can (should?) be thought of as boundary objects as well.

To be a good boundary object, an object needs several properties:

  1. Spans/lives in two worlds/translates – this is the most obvious point. It needs to have genuine uptake of the language and concepts of the scientists but also genuine uptake of the language and concepts of the policy makers. It probably needs to be enough of a compromise to maks both sides a bit uncomfortable. A bit too certain for scientists. A bit too quantitative for policy makers. An ANOVA table or regression table does not span (full comfort for scientists, all the discomfort for the policy makers). A bar graph of standardized coefficients is a bit better. A decision (regression) tree is a lot better (and it makes both groups stretch their comfort zones)..
  2. Central – a good boundary object needs to get to the heart of the matter and show the key variables enough to stimulate discussion and yes, provoke disagreement, or it is not doing its job. Just a map of the area without any layers is not a boundary object. A map that identifies existing resources, existing practices, proposed regulatory zones is a good boundary object.
  3.  Highly public – a boundary object needs to be easily available to everybody on both sides of the boundary – probably on the internet in this day and age. A drawing on a cocktail napkin from a discussion between two people is not a boundary object. But if it is scanned and put on the internet (or emailed to a group) it could be.
  4. Credible – a boundary object needs to be reasonably credible as objective and neutral. If it is seen as a tool for one side to win, it won’t be used. Indeed, even if it is unintentional (e.g. bad initial parameters in a model), just being perceived as biased can be the kiss of death to the life of a boundary object.
  5. Changeable/editable – boundary objects need to be changeable. As the discussion across the boundary changes and moves, the boundary object needs to capture and reflect. In some cases, a boundary object can become the centerpoint of negotiation.

I stated earlier that both maps and what-if (scenario-driven) models are great boundary objects.  Assuming they map or model the right variables it is pretty clear how they meet the five criteria. Especially maps. The ideal model to serve as a boundary object has a number of clear input parameters that can be changed to see how the outcomes change. This is especially powerful when the model is fronted on the web where anybody can tweak the parameters and play with the model. A model is also powerful when the assumptions can be written down clearly (although just making clear the what the inputs and outputs are is useful).

As the title of this post suggests, boundary objects can be extraordinarily, surprisingly successful in invoking boundary spanning. I’m sure almost any ecologist who has put themselves in a policy context (hearing, forum, etc) has seen the power of a map. I saw it a couple of weeks ago in my town – there was a public hearing on building a giant student apartment complex in close proximity to some wetlands. The whole conversation centered on the architectural drawings (which were mounted on a 3×5 poster board). And when a scientist got up and started talking about why he thought the soil survey was wrong, he didn’t just say it, he took the time to hook up to a projector and show soil maps. Maps just change the whole conversation. They don’t make people magically agree (of course!). But they make the conversation much more concrete, much less talking past people and not being heard, and ultimately much more productive.

Models are used much less often in environmental policy in my experience (but still frequently). They can also be game changers. It doesn’t mean people agree with the model. But they do mean people can start to understand what the most important variables are. And they can start to have concrete dialogues about what the right assumptions are. A great example where maps and models interesect is the diagrams being produced of sea level rise in response to climate change. To a large degree the map aspect dominates. But in more nuanced conversations outside of the press, they start to lead to error bars (whats that map look like if the seas only go up 20 cm or up 1m), they start discussions about what we do and don’t know about ice melt, etc. My whole job before I returned to academia was building models to be used as boundary objects in the business world. I spent 5 years of my life modelling the changes to mail sorting that would happen with automation (optical character readers and bar code sorters).  These models served as a focal point for launching 100s of discussions from the impacts for unions, to change in the number of facilities needed to what types of machines to buy and what types of mail to incentive in the future.

Maps and what-if models aren’t the only useful boundary objects. I already mentioned decision trees (output from a regression tree). While a single regression tree might be less trendy and statistically proper than a random forest, it is a WAY better boundary object. Managers intuitively get regression trees and can immediately start discussing limitations of the statistical model, matching the model vs their mental model of reality, and see policy implications. Another boundary object is forcing a quantitative weighting of priorities. This can be done with techniques as simple as voting for rank order. Or as complex as using analytical hierarchical process. Having a discussion to conclude that genetic diversity deserves 27.2% of our attention, taxonomic diversity 37.8%, and functional diversity 35% is totally arbitrary and inherently wrong by being so one dimensional – but it is a fantastic way to have a constructive conversation! (again that theme of a good boundary object takes everybody out of their comfort zones). Similarly a “health of nature in Maine” index combining different factors with arbitrary weights would be stupidly oversimplified from the reality ecologists know, but a great boundary object. Even reports serve as boundary objects – think of the just released IPCC report (of course the many maps, charts and models are each boundary objects) but the wording of the report itself stirred great debate and discussion on what we know, how certain we are, etc. Scenario generation (sensu this paper) is another less-quantitative boundary object.

As a slightly more detailed case study … even just simple access to data can serve as a boundary object so long as the effort is made to genuinely put the data in the middle of the boundary, not just in the scientist’s world.  I’m working on a project for the Maine Sustainability Solutions Initiative to take a mix of freely available, but technically complex data (e.g. shapefiles) and new data (e.g. model projections) produced by our researchers and just put it in a simple, map and trend chart interface in a web-browser. I keep getting told – well those GIS layers are already available or you’re missing the complexity behind the forecasts, but they’re kind of missing the point of a boundary object. Its about putting conversation starting information out in a genuine spanning (lives in two worlds) context. The average state legislator or town councilor is not going to pull out a GIS layer. But they will spend 5 minutes on the web. And if they do they will be able to get a trend chart of climate change in their town or a map of land cover change over the last 20 years in their town or the changing economy of their town (part of the appeal of maps and part of the spanning is people always want to look at where they live) . And they will start putting patterns together. Start comparing past to projected future. Start looking for more information on assumptions behind the models. And have a lot of conversations they wouldn’t have had. Time will tell if this specific project serves its purpose, but if past experiences with boundary objects are any guide, my money is that it will. This ties into the theme of how to make sure research gets across the boundary and doesn’t just mold away in journals – which will be the topic of my next post on boundaries.

But my bottom line experience is that getting a bunch of people in a room with different opinions and “just talking about it” or “letting everybody be heard” is vastly less effective than getting a bunch of people in a room with different opinions with a boundary object at the center of the discussion. It focuses things in very concrete ways and towards negotiation and compromise and increases the level of understanding and minimizes the amount of talking past each other between the sides.

One could speculate for a long time about the psychology of why boundary objects work (an irrational belief that anything coming out of a computer is correct, the ability to find “my house” on the map, focusing people in a constructive direction, genuine success at translation and spanning, and etc). These are interesting topics of study in their own right (and are being studied), but not my own field of research. I just notice how well they DO work. Its almost like its magic (except of course the reality is its a lot of hard work behind the scenes). Hence the title of “unreasonable effectiveness”

What boundary objects have you used in your work? Were they effective? What made them effective? Any experiences with what made them more effective?



The importance of knowing and recognizing the limits of your knowledge

At some point in every qualifying exam, there will be a question that the student doesn’t know the answer to. Actually, that’s not quite accurate – this doesn’t happen just once; it happens repeatedly, in every qualifying exam. That’s part of the point of the exam: to explore what the student knows and what they don’t know.* What I will argue here is that it’s essential in this case – and, indeed, in science in general – to realize when you don’t know something and to admit that. To put it into Donald Rumsfeld’s framework: scientists need to know their unknowns.

If we go back to the scenario of the qualifying exam: Ideally, when asked about something they don’t know, the student says something like “I’m not sure about that topic, but what I think might happen is that…” That is, the student acknowledges that they are moving beyond the limits of their knowledge, and are beginning to speculate. (Usually, in a qualifying exam, the faculty are interested in that speculation, to see how the student works through a new idea or problem. Speculating is fine; BSing and trying to pass it off as knowledge is not.)

While trying to bluff your way through a qualifying exam isn’t a good strategy, it’s also not going to harm anyone else. In other situations, though, failing to recognize and/or acknowledge what you do and do not know is really important, with the potential to cause harm. This is something I’ve discussed with my sister, who is a physician. I think it’s pretty clear that it’s dangerous for a physician to go beyond his or her knowledge. For example, if she isn’t sure of the recommended treatment for a condition, she wouldn’t just guess as to what to prescribe; she would do additional research or refer the patient to a specialist. Clearly there’s the potential for real harm to someone’s health if she doesn’t acknowledge when she doesn’t know about something, or if she thinks she knows something but doesn’t. In the case of scientists, usually physical harm doesn’t result, but it’s possible – for example, if you’ve never set up an acid bath before, you don’t want to just take a guess as to whether to add the acid or water first.**

Aside from physical harm, why is it dangerous for scientists to not realize or admit when they don’t know something? Perhaps the most important reason is that it can severely compromise the data that is collected; in the worst case scenario, it’s useless but the scientist doesn’t recognize it as useless. Unfortunately, this happens. I know one lab that lost an entire field season’s worth of data because of a very basic mistake a technician was making the whole year. In that case, at least they realized it (though only by chance, right at the end of field season). What is really scary is that, if they didn’t notice it (which easily could have happened), they would have proceeded to analyze and publish those data with no idea that they were not collected properly. Sometimes something about the data will indicate that there was something “weird” going on — say, phosphorus levels being really high might indicate a contamination problem. One of my biggest lab fears is that someone will be collecting data that appears reliable but actually isn’t. Intentionally falsifying data is obviously unacceptable and a major breach of ethics, but, from the point of our ability to understand nature, it is just as bad for someone to be accidentally collecting data that seems reliable but isn’t.***

I think one way to minimize the risk is to establish a lab culture that encourages people to ask questions and to admit if they aren’t sure of something. I also spend time trying to make sure that everyone realizes that, if they make a mistake in collecting data, it affects a lot of people. Students (especially undergrads) really respond to this. In lab meetings focused on ethics, we usually talk about the impacts of falsified data on the people who spent their PhD trying to follow up on those results. This always makes an impression on students, and it’s easy to talk about how this also applies to data that are inaccurate due to a lack of care or knowledge. If you have a good lab culture, they will care about their labmates and want to help them. At the same time, you don’t want to paralyze people with fear of making a mistake, and you really don’t want to discourage someone from admitting it if they realize they’ve made a mistake or been doing something wrong. Again, though, they’re more likely to admit that they’ve made a mistake when they realize that it will affect others. If something went wrong, admit it. And, if you’re not sure how to do something, ask.

Not knowing (or recognizing) the limits of your knowledge also results in a trust problem. This is true both for the individual and for scientists as a group. As an example at the individual level: I once went to a talk at a major international meeting where the person was presenting results of a study that was framed as being about Daphnia, but the picture shown when introducing the study system was of Ceriodaphnia. If the person couldn’t figure out the genus of their main study organism, should I really trust that they had figured out how to measure phosphorus correctly? And, at a larger level, if I were to say in public that, say, there was going to be an outbreak of a horrible disease tomorrow and there isn’t,**** that erodes the public’s trust in scientists. This is not to say that speculation is never appropriate – just that you need to make it clear when you are speculating.

So, in short: know what you know, know what you don’t know, and, if you’re speculating, acknowledge that.


* Students often seem kind of terrified at the idea that they will be asked something that they don’t know. But it is completely to be expected that every student will have multiple questions that they don’t know the answer to. No one in the room could answer all of the questions that are being asked. There are usually 3-5 people with many years of experience asking the questions, and they’ve usually been chosen because they have complementary expertise. I have thought “Huh, that’s interesting, I’ve never thought about that” and “Hmm, I think I knew that once, but apparently have forgotten it” countless times during oral exams.

**“Acid to water, just like you oughta”

***This relates to my post on the merits of system-based research: I think you are less likely to make a mistake due to an “unknown unknown” if you know a system really well.

****I think not making stuff up is so important that I am being intentionally vague here. It feels wrong to me to make up a fake prediction and put it here in writing.

Here are the slides from my talk on blogging

My talk on blogging at Virginia Tech went over quite well. Enjoyed the whole visit, it was very well organized (thanks Alex and Greg!) and I had a bunch of really interesting conversations.

For folks at Virginia Tech and elsewhere who wish they could’ve seen my talk, here are my slides. In the notes pages are a few comments on how I put the talk together and why certain otherwise-mysterious slides are included.

virginia tech blogging seminar slides