Ecologists (and lots of other people) often say that the world, or some feature of it, is ‘random’ or ‘stochastic’. But what *exactly* does that mean?

One view is that randomness is real; some features of the world are inherently probabilistic. Quantum mechanics is the paradigmatic example here, but that doesn’t mean there aren’t others. An alternative view is that calling something ‘random’ is shorthand for our ignorance. If we knew enough about the precise shape of a coin, the force with which it was flipped, the movement of the surrounding air, etc., we could accurately predict the outcome of any particular coin flip, which is deterministic. But we don’t have that information, so we pretend that coin flipping is a random process and make probabilistic statements about the expected aggregate outcome of many flips.

Does the distinction between these two views matter for ecologists? It’s tempting to say no. In practice there’s no possibility we’ll ever have enough information to predict the roll of a die, so we lose nothing by treating it as random. No less an authority than Sewell Wright was of this view. But I’m going to suggest that’s incorrect; I think ecologists do need to decide whether they think randomness is real, or merely determinism disguised by our ignorance. And I’ll further suggest that the appropriate choice can vary from case to case and is only sometimes dictated by empirical facts.

If apparent randomness is just ignorance of relevant information, then when we learn new information the apparent randomness of events should decline. This happens whenever you add an additional predictor variable to a statistical model, increasing explained (deterministic) variation and reducing unexplained (random) variation. A personal favorite example of mine is recent work on the ‘decision’ by a bacteriophage as to whether to lyse its bacterial host. This decision had been regarded as a paradigmatic example of a probabilistic biological process. But it turns out that the decision is actually quite (although perhaps not entirely) deterministic, and depends on the size of the host cell. Cell size varies, and phage decision making looks random if you don’t account for that variation. This is a specific example of a general principle: if some process (like the lysis decision) is not random with respect to some property or outcome of interest (like cell size), then it’s simply false to treat that process as random.

But is it always a good idea to try to minimize apparent randomness by incorporating all relevant information? In ecology, Jim Clark has argued as much (if I understand him correctly). But I’m not so sure I agree. If calling something random is merely to statistically summarize the net effects of various unknown deterministic processes, well, *summaries are really useful*. Think for instance of genetic drift, and its ecological equivalent, demographic stochasticity. Genetic drift and demographic stochasticity arise from random variation in the birth and death rates of individuals that is independent of their phenotypes and other properties, and so would occur even if all individuals were otherwise identical. I’m happy to stipulate that, if we knew enough, much or even all of this apparent randomness could be explained away. But why would we *want* to explain it away? What would we gain? I’d argue that we’d actually *lose* a lot. We’d be replacing the generally-applicable concepts of genetic drift and demographic stochasticity (and the associated well-developed, highly elegant, and well-tested mathematical theory) with a stamp collection of inherently case-specific, and hugely complex, deterministic causal stories. The complex deterministic causal factors generating apparently-random variation in the birth and death rates of, say, different *E. coli* genotypes in a laboratory culture have nothing to do with the complex deterministic causal factors generating apparently-random variation in the birth and death rates of, say, introduced rats on a marine island. The important thing is that deterministic causal factors in both cases have apparently-stochastic consequences described by models of genetic drift and demographic stochasticity. Laplace’s demon, which has perfect information about the position and movement of every particle of matter in a deterministic universe, would see no randomness—thereby making it completely ignorant about one of the most important and best-confirmed concepts in all of ecology and evolution (see here for more on this).

And while Laplace’s demon is a mere philosopher’s dream, even trying to emulate it has its pitfalls. In *Do lemmings commit suicide? Beautiful hypotheses and ugly facts*, population ecologist Dennis Chitty describes his career-long unsuccessful struggle to identify the causes of population cycles in small mammals. His lack of success is almost certainly attributable, at least in part, to his search for a deterministic sequence of causal events that always drives population cycles. Observations that a particular causal factor was apparently weak or absent in some cases (or even absent at one particular time for one particular population of one particular species) repeatedly caused him to modify or abandon his causal hypotheses. In contrast, modern stochastic dynamics has been quite useful for inferring the causes of population fluctuations (e.g., Henson et al. 2002 Oikos). See here for further discussion of the pitfalls of insisting on an overly-detailed ‘low-level’ description of one’s study system.

Bottom line: if randomness is ignorance, sometimes ignorance is bliss.

p.s. The distinction between real and merely apparent randomness crops up outside of science too, for instance in professional sports. Traditionally, events in sports—such as who wins and who loses—often are explained by appeal to specific details associated (or putatively associated) with the event. Perhaps the winning team exhibited a stronger ‘will to win’, or is ‘on a hot streak right now’, while the losers were ‘tired’ and ‘wilted under pressure’. For many traditionalists, much of the appeal of sports is in these explanatory stories. But such claims invariably are post hoc and so impossible to test—had the outcome been different, we’d have told a different story to explain it. Recently, statistically-minded observers (especially of baseball) have begun insisting that many events in sports really are random, or at least are best thought of as random because we are ignorant of their causes (although we may think we’re not). As another example, religious beliefs sometimes have been interpreted as a way for believers to see deterministic causality, order, and a purposeful plan in a universe that would otherwise appear random, uncontrollable, and purposeless.

xkcd hits the nail on the head, as usual.

*Note:* This is a rerun of a post I first wrote for Oikos Blog back in 2011. Sorry, Meghan, Brian and I are all swamped right now. Normal service should resume soon.

Hi Jeremy, nice essay. A few quick responses.

1) I think that stochasticity/randomness is best defined *with respect to a given model*. I do not think that the concepts of stochasticity (or determinism) mean much in any absolute sense. So from this point of view, I suppose I would argue for a slightly more pragmatic take.

2) That we can sometimes very reliably model stochastic aspects of reality with probability reflects a deeper order about the world. For instance, the probability distributions underlying QM work astonishingly well. To me, this is different than an admission of ignorance, yet…

3) De Finetti showed pretty convincingly that whether we think of chances as arising from randomness ‘out there’ or ignorance due to limited information doesn’t matter! As long as we have exchangeability criterion, we can proceed “as if” we are dealing with randomness out there. An entertaining if slightly obtuse take on De Finetti is in Diaconis and Skyrms recent book “10 great ideas about chance”

Re: your 1, I don’t quite follow, can you give an example?

Re: your 2, I’m obviously not a quantum physicist but I’d be perfectly happy to say that quantum phenomena “really” are stochastic.

Re: your 3, I know of De Finetti but haven’t read any of De Finetti’s work, thanks for the pointer.

Hi Jeremy,

Re:re:1) Depending on model, the same set of factors may be either “deterministic” or “stochastic”. For instance, say we have a model of reforestation incorporating recruitment, growth and mortality. Extended over space, we will need to consider seed arrival and emergence as stochastic. With respect to a given set of patches or quadrats, these patterns will indeed be well modeled as stochastic random variables. However, it is possible to imagine studying an individual seed’s dispersal as a function of mass, geometry, position of release, wind currents, and so forth. Now, such a model might be practically intractable, but we *could* move in that direction. At one level, we have something that looks like “determinism”, at another level it is “stochastic”. Likewise for coin flipping. We can’t accomplish the micro-specification necessary to predict outcomes (Persi Diaconis has a lovely diagram illustrating the phase-space of momentum and rotation, and how it is that we end up with near maximal uncertainty, i.e. 50-50 odds). Now, where I take this one step beyond mere ‘epistemic uncertainty’, is that for many systems, it doesn’t matter where the randomness comes from! The stochasticity induces the same results – i.e. from the perspective of the reforesting patch, seed arrival *really is* stochastic in every meaningful sense.

2) As another fellow layman when it comes to theoretical physics, I too am satisfied that reality at the level of the quantum may simply be “stochastic”, in that we will always need to describe using the quantum probabilities. Another plug for “10 Great Ideas About Chance”: their chapter on physics is great! (Note that book as a whole has a few issues with clarity at points and could have used another round of editing IMO).

3) De Finetti is very obscure thinker and writer. Nevertheless, his Representation Theorem(s) are worth knowing about, given that exchangeability is so important for hierarchical models in general, and, some would say, Bayesian modeling as a whole.

For me, the question of whether we should think of something as random or not is a cost-benefit trade-off; if a given causal driver is hard to measure, if it changes rapidly relative to the time scale of the variable we’re trying to predict (or our measurement scale), if its effect on a variable of interest is very small (relative to other drivers) or its effect changes sign or magnitude by large amounts depending on context (e.g. with nonlinear dynamics), it’s better to model that driver as part of the random forcing on a system (the error term). Really, it’s an information theory sort of argument: how will collecting more data on this factor increase my ability to predict the variable I care about?

This means that whether we should treat a given effect as part of the error term or measure it explicitly also depends on the question. If we were looking at say the effect of temperature on growth rates, it might make sense to model it as a noise term if we’re studying the species in a relatively consistent environment, but we couldn’t ignore it in an area where the climate is changing rapidly and directionally. As an example: Ranson Myers has a great paper “When do environment-recruitment correlations work?” where he argues that including environmental predictors in stock-recruitment functions only seems to work for species on the edge of their range (i.e. where we expect the environment-recruitment relationship to be most predictable); otherwise, adding predictors like temperature often just seems to add noise / non-reproducible functions.

There’s also the regress problem: let’s say we could perfectly measure every effect of environmental variables on, say, tree growth (as Jim Clark’s arguments have focused on). If we want to predict how a forest grows, though, we still need to predict how those environmental variables will change; that just pushes our “is it deterministic or stochastic” question back one step, from tree growth to the environmental variables affecting growth. It’s pretty easy to see that your predictive horizon (the number of things you have to measure to predict a given phenomenon) is only going to grow the further up the causal chain you look.

I agree that these sorts of pragmatic considerations are central to good everyday decision-making about how to model stuff.

The “if we only knew all the relevant variables we could predict the outcome” school of thought potentially has real world consequences. For example it underpins the fantasy that we will be able to geoengineer the climate and control the consequences of that meddling. So understanding that it will never be computationally possible to deal with that many variables, and the important knowledge that even a very small change on one variable could easily alter the outcome of the coin flip should be an important lesson in humility for scientists (and politicians).

Thanks Jeremy,

Your post got me thinking about uncertainty. In Engineering two types of uncertainty are considered:

Aleatory uncertainty – uncertainty that arises through natural randomness or natural variability.

Epistemic uncertainty – uncertainty associated with the state of knowledge of a physical system.

Aleatory uncertainty can’t be reduced, but may be able to be characterised. Epistemic uncertainty may be able to be reduced through advances in measurement techniques or improved understanding.

Ang, A. and Tang, W. (2007) Probability concepts in Engineering. Wiley.

One way to rephrase this post would be to say that I don’t think it’s *always* wise to try to reduce epistemic uncertainty. Sometimes it is, as in the phage example. Sometimes not, as in the genetic drift example.

Can you think of examples of purely random processes in ecology? It doesn’t really matter in practice, but it’s an interesting phylosophical question.

I think decision making by animals can sometimes be considered as random (the same animal can probably make different decisions when faced with the same situation under the same circunstances and with the same internal state), but perhaps not.

If by random you mean “X is random with respect to Y”, then sure. First one to come to me off the top of my head: for her master’s here at Calgary, Susan Bailey did a study of how snail foraging affects the spatial distribution of periphyton in artificial streams. Snails move non-randomly at larger spatial scales; when they find patches with lots of algae, they stay in them and eat all the algae before moving on. But algal distribution also is spatially-heterogeneous on very small spatial scales, too small for the snails to detect the heterogeneity. So the snails just average across that small-scale heterogeneity in periphyton density and move at random with respect to it. So their foraging does not alter the distribution of algae at small scales, the way it does at larger scales.