Tobler’s first law of geography says “Everything is related to everything else, but near things are more related than distant things”. Nobody has ever had the clout to coin the equivalent law in ecology. But it is equally true. Everything shows autocorrelation in ecology. Environmental variables, abundance of a species, species richness, etc are all autocorrelated in space. They’re all autocorrelated in time too. And traits and life history variables like body size, age at first reproduction, etc are all autocorrelated phylogenetically too. Conceptually autocorrelation is exactly as in Tobler’s first law – close things are similar with the similarity getting less with distance.
The strict mathematical definition of autocorrelation says that if we measure variable x at location i and then at location j, then correlation(xi,xj)≠0. Of course for this to make sense we have to conceive of this in a sampling situation where xi and xj are sampled (you can’t do a correlation of two points). The sampling is not the kind of sampling most ecologists are used to – i.e. sampling individuals. Mathematically, the sampling is from a “field” defined by Wikipedia as “a physical quantity that has a value for each point in space and time.” – in short something like abundance of a species, species richness, etc that is measurable at any point in space and time. I don’t want to go too far down the road of the mathematics. Suffice it to say the common assumption to make analysis mathematically tractable is stationarity. That is the mean, the variance and the covariance (or correlation) are all constant across space and time. The second major assumption is that the covariance is independent of direction measured – e.g. North or West – also known as isotropy. Isotropy and stationarity let us assume that the correlation (or covariance (recall correlation is just covariance rescaled by standard deviations of the two variables) is a function only of the distance between two points. That is corr(xi,xj)=f(di,j) where di,j is the distance between points i and j. Thus the correlation structure depends only on the distance between points. Often but not always correlation gets smaller as di,j increases (i.e. f is a monotonically decreasing function of distance). Distance could be physical distance in space, distance in time, or distance in phylogeny. Earlier, I described in some detail my preferred way of dealing statistically with autocorrelation, namely GLS or generalized least squares here and won’t repeat that. Suffice it to say that GLS models the correlation structure between points by estimating a full covariance matrix for errors (not just one with the variances on the diagonal). Thus in one longish paragraph I have only begun to build the mathematical machinery necessary to formalize statistically Tobler’s elegantly stated law. I won’t go further here but there a lot of sources out there if you want more detail (my recommended method – go to Google Scholar and google the type of autocorrelation you’re interested in – i.e. “spatial autocorrelation” or “phylogenetic autocorrelation”).
What I want to focus on here, is what should we ecologists make of the fact that autocorrelation is a universal fact of life? I would argue there are essentially four attitudes we can adopt towards autocorrelation in our systems: friend, foe, unimportant, or ignorance measure. Let me describe each of these in a little more detail.
- Unimportant – this one is pretty self explanatory. Just ignore autocorrelation. It is what ecologists have done for most of history and what most ecologists do today.
- Foe – #1 tends to make a lot of people who are sticklers for statistical assumptions pretty annoyed! Autocorrelation by definition strongly violates the assumption of independence between error terms central to most statistics. I’ve argued elsewhere that this is true but maybe not as important as people make it out to be (here and here). I’ve also argued elsewhere that in a machine-learning context autocorrelation is much more of an enemy than we’ve acknowledged it to be. I do NOT want to rehash either argument here (you are welcome to add comments to the original posts if you want). But I do want to point out that this approach is basically treating autocorrelation as a nuisance, a violation of assumptions that must be got around. In short autocorrelation is the enemy.
- Friend – in statistical terms #2 and #3 need be no different – both can use GLS as the statistical tool of choice. However, the attitude, use, what receives attention, etc are totally different between #2 and #3. In #3 ecologists see autocorrelation as representative of important biologically interesting processes and see analysis of autocorrelation as revealing about those processes. If instead of focusing on the p-value correction from GLS we focus on the estimated covariance structure, or equivalently if we focus on the covariogram (or correlogram or pair-correlation function or Moran’s I or etc), we can immediately get clues about processes. Autocorrelation results from biologically important processes including: dispersal limitation, autocorrelation of underlying environmental variables, integration across space by predators moving at larger scales, etc. In space, positive autocorrelation of abundance is indicative of clumping and negative autocorrelation is indicative of overdispersion (which can start pointing to processes like dispersal limitation and environmental filters vs competition for resources). At what scales is autocorrelation strong and at what scales does it disappear? Is autocorrelation isotropic, or are there some directions where autocorrelation is stronger (a hint about dispersal directions)? Are there scale breaks in the strength of autocorrelation (indicative of different processes at different scales)? In time is there unusually strong positive autocorrelation every four years? (If yes then its a hint to at least look at El Nino as an important phenomenon in your system). Is autocorrelation weak in your phylogeny? If yes then the trait is evolutionarily labile (or not heritable). Autocorrelation is a pattern and therefore not the ultimate endpoint, but it can be a great signpost of where science needs to search.
- Ignorance Measure and Alternative – Autocorrelation can serve as a benchmark for our ignorance, or phrased more positively as a model to outperform. In short autocorrelation is a bit of a null model. Imagine we want to predict abundance of a species and we develop a model predicting abundance as a function of temperature, soils, etc. To be rigorous we need to predict the abundance at a location that wasn’t used to calibrate the data. What would count as a good prediction? One would hope our mechanistic understanding or our correlational relationships to explanatory variables is good enough that we can make a better prediction than pure autocorrelation. But this is not always so. In a paper, Bahn and McGill 2007, Volker and I showed that the prediction “abundance is the same as the nearest observed site (averaging 200 km away)” was as good a prediction as the fanciest available niche models. To me this means we have a lot of work to do in understanding what truly controls the distribution and abundance of organisms (but it gives a really easy and cheap way to make predictions at all the locations where we haven’t yet measured abundance). The early days of weather forecasting suffered the same fate. Early computer models could not beat the forecast “tomorrow will be the same as today” (nor could they beat the “tomorrow will be the same as the climatic average for this month”); both of these were forms of autocorrelation prediction. Over time the models improved to the point where they can handily beat these models. I look forward to the day when ecology can beat the null model of autocorrelation as well. We can also use this not as a null model but a prediction model for conservation – e.g. if we don’t know the dispersal distance or other important model parameters for an endangered species we will often use the value from a better known and measured sister species. This is autocorrelation as prediction.
In summary, ecology has spent the last 20 years moving from ignore autocorrelation to autocorrelation=enemy. There has been a lot of heated debate about whether we should or shouldn’t make this transition (e.g. title Spatial autocorrelation and red herrings in geographical ecology by Diniz-Filho et al 2003). I would like to argue that it would be a great advance for ecology to move beyond this debate to #3 (autocorrelation is an informative tool) and #4 (autocorrelation is a kind of null model to assess the progress of understanding and to make predictions in the meantime). This ought to be an agenda everybody can get on board with.