A recent post of mine about why Biosphere 2 was a success stirred mixed reactions. But one of the most common negative reactions was that there was no replication in Biosphere 2, which of course EVERYBODY knows is a hallmark of good science. This actually spilled into a spirited discussion in the comments. So, do we need replication to do good science?
Anybody who has read some of my older posts (e.g. one true route post, statistical machismo post) will know that my answer is going to be no. I’m not going to tell a heliologist that they are doing bad science because they only have one sun (they do have the stars, but most of the phenomena they study like sun spots are not yet studyable on other stars). Nor am I going to say that to people who have developed theories about why our inner solar system contains rock planets and the outer solar system contains giant gaseous planets (although in the last 2-3 years we are actually getting to the point where we have data on other solar systems, these theories were all developed and accepted well before then). And Feynman’s televised proof that a bad interaction between cold weather and a rubber O-ring led to the demise of the Space Shuttle Challenger definitely did not need and would not tolerate replication. Closer to home, I am not going to tell people who have been measuring CO2 on top of Mauna Kea (aka the Keeling Curve one of the most well known graphs in popular science today) that their science is bad because they only have one replicate. Nor am I going to tell people who study global carbon cycling to give up and go home because it is a well mixed gas on only one planet (I mean come on N=1, why waste our time!?). In short , no, good science does not REQUIRE replication.
Let me just state up front that replication IS good. The more replication the better. It always makes our inferences stronger. We DO need replication when it is feasible. The only problem is that replication is not always possible (sometimes even with infinite amounts of money and sometimes only due to real world time and money constraints). So the question of this post is NOT “do we need replication?” It IS”do we HAVE to have replication” and “what do you do in these trade-off or limitation situations?” Give up and go home – don’t study those questions – seems to be some people’s answers. Its not mine. Indeed any philosophy of science position which leads to the idea that we should stop studying questions that inconveniently fail to fit a one-stop-shopping approach to science is not something I will endorse. This is the statistical machismo I have talked about before – when one has to make the statistics so beautiful AND difficult that few can achieve the standard you have set and you can then reject others work as WRONG, WRONG, WRONG. Careful thinking (and perusing the examples in the last paragraph) lead to a number of ways to do good, rigorous science without replication.
First let’s step back and define what replication is and why it is important. Wikipedia has several entries on replication, which in itself is probably informative about the source of some of the confusion. When ecologists think about replication they are usually thinking about it in the context of statistics (wikipedia entry on statistical replication) and pretty quickly think of Hurlbert’s pseudoreplication (also see Meg’s post on the paper) . This is an important context, and it is pretty much the one that is being violated in the examples above. But this definition is only saying you need replication to have good statistics (which is not the same as good science). But Wikipedia has an alternative entry on “replication – scientific method” which redirects to “reproduceability”. This definition is the sine qua non of good science, the difference between science and pseudoscience. Reproduceability means if you report a result, somebody else can replicate your work and get the same thing. If somebody is doing science without reproduceability, call them out for bad science. But don’t confuse it with replication for statistics. Ecologists do confuse these two all the time. Thus to an ecologist replication means multiple experimental units well separated in space (not well separated=pseudoreplication, not multiple=no replication=degrees of freedom too small). As I said, those are both good goals (which I teach in my stats class and push students to achieve). But they are not the sine qua non of good science.
It is instructive to think about an example that came up in the comments on the Biosphere 2 post: the LHC (large hadron collider) and the hunt for the Higg’s Boson. Pretty blatantly they did not have ecological replication. Each LHC facility costs billions of dollars and they only had one (ditto for Biosphere 2). But the physicists actually had an extremely well worked out notion of rigorous reproduceability. Despite only having one experimental unit, they did have multiple measurements (observed particle collisions). Thus this is a repeated measures scenario, but notice that since there was only one “subject” there was no way to correct for the repeated measure. The physicists made the assumption that despite being done on one experimental unit, the measures were independent. But what I find fascinating is that the physicists had two teams working on the project that were “blnded” to each others work (even forbidden to talk about work with each other) to tackle the “researcher degrees of freedom” problem that Jeremy has talked about. They also had very rigorous a priori standards of 5σ (p<0.0000003) to announce a new particle (I seem to recall that at 3σ they could talk about results being “consistent with” but not “proof of” but I haven’t found a good reference to this). So, in summary, the Higg’s test had an interesting mix of statistical replication (5σ), reproduceability (two separate teams) and pseudoreplication (uncorrected repeated measures) from an ecologist’s perspective.
So what do we get out of statistical replication? The biggest thing is it allows us to estimate σ2 (the amount of variance). We might want to do this because variance is innately interesting. For instance, rather than ask does density dependence exist, I would rather ask what percent of the year-to-year variance is explained by density dependence (as I did in chapter 8 of this book and as I argued one should do in this post on measures of prediction). Or we might want to quantify σ2 because it lets us calculate a p-value, but this is pretty slippery and even circular – our p-value gets better and better as we have more replication (even though our effect size and variance explained don’t change at all). This higher p-value due to more replication is often treated as equal good science, but that is poppycock. Although there are valid reasons to want a p-value (see Higg’s Boson), pursuit of p-value quickly becomes a bad reason for replication. Thus for me, arguing to have replication to estimate σ2 is a decidedly mixed bag – sometimes a good thing, sometimes a bad thing depending on the goal.
However, and to me this is the biggest message in Hurlbert’s paper but often forgotten against the power of the word “pseudoreplicationn”, is the #1 problem driving everything else in the paper is the issue of confoundment. If you only have one site (or two or three), you really have to worry about whether you get the effect you observed because of peculiarities of that that site and any weird covariances between your variable of interest and hidden variables (Hurlbert’s demonic intrusions). Did you get more yield because of pest removal as you think or because its downhill and the soil is wetter? One way to kill the demon of confoundment is to have 100 totally independent, randomly chosen sites. But this is expensive. And its just not true that it is the ONLY way to kill the demon. I don’t think anybody would accuse the LHC of confoundment despite only having one site. You could spin a story about how the 23rd magnet is wonky and that imparts a mild side velocity (or spin or I don’t know my particle physics well enough to be credible here …) that fools everybody into thinking they saw a Higg’s boson. But I don’t hear anybody making that argument. The collisions are treated as independent and unconfounded. The key here is there is no way to measure that or statistically prove that. It is just an argument made between scientists that depends on good judgement, and so far the whole world seems to have accepted the argument. It turns out that is a perfectly good alternative to 100’s of spatial replicates.
Let me unpack all of these examples and be more explicit about alternatives to replication as ecologists think about it – far separated experimental units (again these alternatives are only to be used when necessary because replication is too expensive or impossible but that occurs more often in ecology than we admit):
- Replication in time – repeated measures on one or a few subjects do give lots of measures and estimates of σ2 – its just that the estimate can be erroneously low (dividing by too many degrees of freedom) if the repeated measures are not independent. But what if they are independent? Then its a perfectly valid estimate. And there is no way to prove independence (when you have only one experimental unit to begin with). This is a matter for mature scientists to discuss and use judgement on as with the LHC – not a domain for unthinking slogans about “its pseudoreplicated”. Additionally there are well-known experimental designs designs that deal with this, specifically the BACI or before/after/compare (just Google BACI experimental design). Basically one makes repeated measures before a treatment to quantify innate variability, then repeated measures after the treatment to further quantify innate variability and then compares the before and after difference in means vs. the innate variability. The Experimental Lakes Area eutrophication experiments are great examples of important BACI designs in ecology and nobody has ever argued those were inconclusive.
- Attention to covariates – if you can only work at two sites (one treatment and one control) you can still do a lot of work to rule out confoundment. Specifically you can measure the covariates that you think could be confounding. Moisture, temperature, soils, etc and show that they’re the same or go in the opposite direction of the effect observed (and before that you can pick two sites that are as identical as possible on these axes).
- Precise measurements of the dependent variable – what if σ2=0? Then you don’t really need a bunch of measurements. This is far from most ecology, but it comes up sometimes in ecophysiology. For a specific individual animal under very specific conditions (resting, postprandial), metabolic rate can be measured fairly precisely and repeatably. And we know this already from dozens of replicated trials on other species. So do we need a lot of measurements the next time? A closely related one is when σ2>0, but the amount of error are very well measured and we can do error analysis that ripples all the error bars through the calculations. Engineers use this approach a lot.
- We don’t care about σ2 – what if we’re trying to estimating the global NPP. We may have grossly inaccurate measurement methods and our error bars are huge. But since we have only one planet, we can’t do replication and estimate σ2, but does that mean we should not try and estimate the mean? This is a really important number, should we give up? (note – sometimes the error analyses mentioned in #3 can be used to put confidence intervals on, but they have a lot of limitations in ecology). And note I’m not saying having no confidence intervals is good, I’m saying dropping entire important questions because we can’t easily get confidence intervals is bad.
- Replication on a critical component – The space shuttle example is a good example of this. One would not want to replicate on space shuttle’s (even if human lives were taken out of the equation cost alone is prohibitive). But individual components could be studied through some combination of replication and precise measurement (#3 above). The temperature properties of the O-ring were well known and engineers tried desperately to cancel the trip. They didn’t need replicate measures at low temperatures on the whole shuttle. Sometimes components of a system can be worked on in isolation with replication but still generalize to the whole system where replication is not possible.
- Replication over the community of scientists – what if you have a really important question that is at really big scales so that you can only afford one control and one experimental unit, but if it pans out you think it could launch a whole line of research leading to confirmation by others in the future? Should you just skip it until you convince a granting agency to cough up 10x as much money with no pilot data? We all know that is not how the world works. This is essentially the question Jeff Ollerton asked in the comments section of the Biosphere 2 post.
So, in conclusion: Ecologists have an overly narrow definition of what replication is and what its role in good science is. High numbers of experimental units spatially separated is great when you can do it. But when you can’t, there are lots of other things you can do to deal with the underlying reasons for replication (estimating σ2 and confoundment). And they are not places for glib one-word (“pseudoreplication” sneeringly said) dismissals. They are places for complex, nuanced discussions about the costs of replication and how convincingly the package of alternatives (#1-#6) are deployed, and sometimes even how important the question is.
What do you think? Have you done work that you were told is unreplicated? How did you respond? Where do you think theory fits into the need for replication – do we need less replication when you have better theory? Just don’t tell me you have to have replication because its the only way to do science!