Different fields and subfields of science have different methodological traditions. Standard approaches that remain standard because students learn them.
Which to some extent is inevitable. Fields of inquiry wouldn’t exist if they had to continuously reinvent themselves from scratch. You can’t literally question everything. Further, tradition is a good thing to the extent that it propagates good practices. But it’s a bad thing to the extent that it propagates bad practices.
Of course, it’s rare that any widespread practice is just flat-out bad. Practices don’t generally become widespread unless there’s some good reason for adopting them. But even widespread practices have “occupational hazards”. Which presumably are difficult to recognize precisely because the practice is widespread. Widespread practices tend to lack critics. Criticisms of widespread practices tend to be ignored or downplayed on the understandable grounds of “nothing’s perfect” and “better the devil you know”.
Here’s one way to help you recognize when a widespread practice within your own field may be ripe for rethinking: look at whether the practice is used in other fields, and if not, what practices those other fields use instead to address the same problem. Knowing how things are done in other fields helps you look at your own field with fresh eyes.
As an example from ecology, consider randomization-based null models. By which I mean models that start with some observed data, and try to figure out what the data would have looked like in the absence of some ecological process (e.g. interspecific competition) by randomly shuffling the observations under some constraints. The idea is that the random shuffling will remove all and only the effects of the process of interest, while the constraints will retain the effects of other processes. If the randomized data resemble the observed data, that shows that the process of interest is absent or unimportant. See here, here, here, here, and here for discussion of various examples, such as mid-domain effect models.
Longtime readers will know that I’m critical of randomization-based “null” models as a tool for inferring process from pattern; I think they mostly don’t work for that purpose (follow the links above for discussion). But for purposes of this post, I’m more interested in how researchers in other fields approach the same problem. Do researchers in other fields use randomized “null” models the way ecologists do? If not, what do they use instead?
Researchers in other fields do often start with observed data and then use some model-based approach to subtract out the effects of some particular process on those data, thereby revealing what the data would have looked like in the absence of that process. But in every case I can think of (which I admit isn’t that many), the model is not a constrained randomization of the observed data. In many cases, it’s some independently-validated theoretical model of the process of interest. I’m thinking for instance of how astronomers use the Boltzmann equation and the Saha equation to subtract out the effects of ionization from observed stellar spectrograms in order to infer what stars are made of (see here and many other websites; this is a standard topic of undergraduate astronomy courses).* In other cases, the subtraction is based on independently-validated background knowledge. Think for instance of how opinion pollsters correct for the fact that some people are more likely to respond to polls by weighting their data based on independent knowledge of the population demographics (e.g., census data).
In looking into how other fields try to subtract out the effects of particular processes from their observed data, I’ve been struck that they never say or assume that the resulting data will be “random”. As far as I can tell, it’s only ecologists who ever think that correctly subtracting out the effects of process X from observed data would result in random data.
But maybe that just shows my ignorance of how other fields work. So here’s my question: do you know of any field besides ecology that uses randomization-based null models to try to subtract out the effects of particular processes from observed data?
I emphasize that I’m not talking about using randomization-based null hypothesis tests in statistics. Plenty of fields use those. I’m talking about attempts to use the same “logic” to test a substantive scientific hypothesis.
*And if you say that ecologists can’t use that approach because we lack that sort of quantitative theory, how can you have any confidence that your randomized null model is actually working as intended? If you can’t write down an explicit quantitative model of the effects of the process you’re trying to subtract out from your data, how do you know that constrained randomization is the correct implicit model?
Via Twitter, Dave Harris points out another example of a field using a theoretically-based simulation model to figure out what observed data would look like in the absence of a specified process or causal factor: the discovery of the Higgs boson:
Notable that physicists did not figure out what data they’d expect to observe if the Higgs boson didn’t exist via constrained randomization of the observed LHC data.
Is there a hard line between null models in the context you’re describing and the various randomization/resampling statistical analyses people sometimes use to determine whether a purported pattern is “significant” (as in not likely to be random)?
Arguably not, which is kind of the problem. Or a problem. A different justification for randomized null models than the one discussed in the post is “I first need to show that there’s *some non-random pattern or other* in my data before I’m allowed to think about explaining it ecologically”. This justification traces back to, and generalizes the logic of, null hypothesis testing in statistics. But the trouble is, the logic doesn’t generalize beyond its original context. In the context of null hypothesis testing, sampling error is present by definition, is never of any scientific interest, and we have good theory that tells us how to quantify it. It makes some sense to first rule out the possibility that any apparent “signal” in your data is really just sampling error. But in other contexts, the analogous logic doesn’t hold. Ecologists who use randomization-based null models aren’t trying to eliminate all and only the effects of sampling error. They’re trying to eliminate all and only the effects of…something they can’t usually describe very precisely.
A couple of old posts on how the logic of null hypothesis testing is misplaced in other contexts:
https://dynamicecology.wordpress.com/2013/05/13/null-and-neutral-models-are-overrated/
https://dynamicecology.wordpress.com/2012/08/30/why-do-our-null-models-nullify-some-effects-and-not-others/
Hi Jeremy,
I wondered if epidemiology would be a likely place to look. I typed “epidemiology permutation null” into a google scholar search, filtered for pubs after 2013. This was my second hit: http://onlinelibrary.wiley.com/doi/10.1111/rssc.12070/full
I skimmed briefly, and it looks like section 4 discusses some recent history of using permutation-based null models outside ecology.
Hmm. Had a quick skim. That looks to me like a strictly statistical application of permutation tests. They’re trying to test statistical null hypotheses about whether two groups truly differ in a bunch of measured attributes.
Hey Jeremy, I’m sorry. I think I’m a bit behind the curve here on the distinctions between different types of null models in this discussion. In my own work, I do that sort of permutation-based hypothesis testing. It seemed a natural fit given I have lots of data with apparent patterns, but little grounding for predictive mechanism, and I approach the patterns with multivariate statistics with unknown confidence intervals. If I had a good idea of mechanism but little data, then a process-based null (like an IBM) would make a lot of sense. What is the kind of randomization null you’re skeptical of? Is the key difference that in the type you are skeptical of, the patterns emerging from the randomization itself stand in for patterns that would emerge over a gradient (or a level on that gradient) absent from the experiment/data collection?
What sort of models did you use in Olito and Fox 2015 to successfully predict “network structure” while failing to predict pairwise interactions? Good kind, bad kind, in the middle?
Thanks!
“What is the kind of randomization null you’re skeptical of?”
Follow the links in the post. But the short version is: the sort of constrained randomization-based “null” models that trace back to Strong/Connor/Simberloff, and that these days are identified with folks like Colwell, Gotelli, and Ulrich.
Olito and Fox 2015 used various statistical models that embodied different (loosely specified) biological hypotheses about the mechanisms governing which pollinator species visit which plant species in an alpine meadow. The models are statistical models that happen to be fairly interpretable biologically. We used Monte Carlo methods as a numerical procedure to figure out what the fitted statistical models predict about various measures of plant-pollinator network structure. For instance, if as a purely statistical matter the best predictors of the frequency with which pollinator species X visits plant species Y are plant abundance and pollinator proboscis length, what would you predict the nestedness of the plant-pollinator network to be?
I’m not sure accounting for ionization in stellar spectra is all that relevant, because that’s a case where there are very well-understood, basic physical processes that are dominant and *have* to be operating. (I’m tempted to guess that an ecological equivalent might be something like the fact that when predator eats prey, the prey population must decrease, or that organisms generally can’t reproduce after they die.) There are lots of other areas of astronomy where multiple, complex, poorly understood processes *might* be operating, and where one could conceivably use a “randomization-based model” as part of the analysis.
Unfortunately, I’m not really sure what exactly is meant by “randomization-based null model” in an ecological context, so I can’t really say how common or rare such an approach is in astronomy. (I mean, it’s pretty clear you don’t mean just any Monte Carlo approach, and you mean something more than just simulating basic sampling errors, but it’s not clear to me what is being “randomized”, or how. And, yes, I took at look at the links, but they assume a pre-existing knowledge of the details….)
But maybe this would be a possible example: This paper” from 2011 looked at the question of whether the primary galaxy in a cluster of galaxies (“BCG” = “brightest cluster galaxy”) tended to be elongated in the same direction as the overall distribution of the rest of the galaxies (the “satellites”) in the cluster. Part of their analysis involved making a control sample based on their data catalog of galaxies and clusters: “… we prepare our control sample as follows: we shuffle the BCGs by assigning random positions (RA and DEC) [basically, celestial longitude and latitude] to them, but keep all other information of the BCGs unchanged. Then around each BCG (at new random position), we re-assign ‘cluster satellites’ by choosing those galaxies that are falling within the R_scale from the BCG”. (R_scale being a measurement of the BCG’s original cluster size.) This, I think, would qualify as a randomization-based “null” model — the idea being that in this model, a BCG would not, on average, share an alignment with (new) galaxies in its vicinity, since there’s no possible causal connection between them. And, indeed, they find that their “alignment” measurement averages to zero for the control sample, while it is significantly nonzero for the original (real) sample.
This seems entirely unexceptional to me, so I suspect that the general answer to your question is that “randomization-based models” are fairly common in astronomy.
“I’m not sure accounting for ionization in stellar spectra is all that relevant, because that’s a case where there are very well-understood, basic physical processes that are dominant and *have* to be operating.”
That’s precisely my point. The non-ecological examples of people subtracting out from their data all and only the effects of some process or factor are all cases where the process or factor is well-understood theoretically. Which I would argue should give ecologists pause when they try to subtract out from their data all and only the effects of some process or factor for which they can’t write down a good universally-applicable theoretical model.
“Unfortunately, I’m not really sure what exactly is meant by “randomization-based null model” in an ecological context”
In an ecological context, the canonical example is randomization of species x site matrices in an attempt to remove all and only the effects of interspecific competition. Diamond (1975) observed that some pairs of closely related bird species in the Papua New Guinea archipelago exhibited “checkerboard” distributions: any given island had one species or the other but never both. Diamond (1975) interpreted checkerboard distributions as evidence for interspecific competition, inferring that similar bird species couldn’t coexist. But as critics pointed out, other processes besides interspecific competition could generate checkerboard distributions (Connor and Simberloff 1979, 1983). Perhaps the two species are best-adapted to different environments and each is found in the environments to which it is best-adapted, for instance. Or perhaps their distributions relative to one another are just a matter of random chance.
In response, ecologists began to use randomized null models to infer whether the distributions of species among sites (e.g., occurrences of birds on islands) reflect interspecific competition. The basic idea is to randomly shuffle the observed “species × sites matrix” (the data matrix recording which species occur at which sites), so as to generate hypothetical data in which the probability that a given species will occupy a given site is independent of whether or not any other species occupies the same site. Typically, the randomization is done under constraints to ensure that the randomized data retain other features of the observed data. For instance, many randomized null models randomize species’ occurrences among sites under the constraints that each site end up with the same number of species as it had in the observed data, and that each species occur at the same number of sites as it did in the observed data. From the observed data, one calculates some test statistic that summarizes the extent to which the sampled species exhibit checkerboard distributions. One then compares the observed test statistic to the distribution of values from the randomized data sets. If species tend to co-occur less often than expected under the randomized null model, one infers that interspecific competition is at work.
This approach doesn’t work, for two reasons. One is the “Narcissus effect” (Colwell 1984): the randomized data still retain some effects of interspecific competition. In particular, competitive exclusion means that interspecific competition prevents species from occupying sites they would’ve occupied in the absence of interspecific competition. Constraining the randomized data to retain each species’ observed frequency of occurrence and each site’s observed species richness therefore smuggles in effects of interspecific competition. Second, mathematical models show that interspecific competition doesn’t invariably lead to any particular pattern in the distribution of species among sites (Hastings 1987, Ulrich et al. 2017). Nor does lack of interspecific competition invariably lead to any distinctive patterns that couldn’t possibly be produced by interspecific competition. For these reasons, ecologists largely turned to manipulative field experiments to directly test for interspecific competition and its effects (reviewed in Schoener 1983, Gurevitch et al. 1992).
Thank you for the link, will follow it up.
OK, thanks for the explanation! Based on that, I think I would say that this sort of thing is moderately common in astronomy. (I’m pretty sure the paper I just finished writing uses a variant or two of this, although I wasn’t testing a specific astrophysical mechanism so much as a hypothesis that one of several empirical distributions of a galaxy property was the “correct” one, and that the other distributions could be explained as the correct one plus a specific kind of measurement bias common to the other studies.)
I would actually characterize this statement — “The non-ecological examples of people subtracting out from their data all and only the effects of some process or factor are all cases where the process or factor is well-understood theoretically” — as false, at least if it’s taken to imply that “all non-ecological examples of people subtracting out…” are like this. There are certainly cases where the process(es) is/are well-understood theoretically, but there are many cases where they are not, and people use randomized versions of observed data in hypothesis-testing.
(I may try to track down the “Narcissus effect” paper; it would be interesting to see if something vaguely analogous might show up in astronomical contexts…)
That 1984 Narcissus effect paper by Colwell & Winkler is a book chapter, if memory serves.
If you think of a few other astronomical examples worth passing on, please do!
A few more random examples:
Tytler+1993:
This paper looked at QSO “absorption systems”, which are absorption lines in spectra produced by clouds of gas (possibly gas within galaxies) seen along the line of sight to a distant quasar (QSO). (Light from distant QSO is absorbed by atoms or ions in intervening gas clouds along the line of sight to us.) Different “systems” correspond to gas clouds at different distances (~ redshifts) along the line of sight. The paper investigated the question of whether separations between more than one such system along a single line of sight had periodicities (there was a previous claim that such systems were preferentially separated by about 100 megaparsecs). Randomized samples were created as control samples with:
1. QSO sky positions unchanged;
2. Observed redshifts of QSO absorption systems randomly re-arranged;
3. Same number of absorbers per QSO as in original dataset (“automatically conserve the total number of separations between pairs of absorbers”)
Theuns+2002:
This looked at similar kinds of data (but more modern, with much higher spectral resolution). The whole thing is complicated, but there is a “Section 4.2.2”, titled “Randomized spectra”, which notes that “The aim of this procedure is to produce new spectra from the data, in which the absorption lines have the same shapes, but any correlation between the lines is destroyed.”
Van Waerbeke+2000:
This was about measuring the “shear” in the oberved shapes (“ellipticities”) and orientations of galaxies due to gravitational lensing by large-scale cosmic structure in between us and the galaxies. Section 4 (“Measured signal”) discusses how to remove a particular term from the estimate of the shear:
“The term $\sigma_{\epsilon}^2 / N$ can be easily removed using a random realization of the galaxy catalogue: each position angle of the galaxies is randomized, and the variance of the shear is calculated again. This randomization allows us to determine $\sigma_{\epsilon}^2 / N$ and the error bars associated with the noise due to the intrinsic ellipticity distribution. At least 1000 random realizations are required in order to have a precise estimate of the error bars.”
Cheers for this, will reply once I follow these up.
@Peter:
Ok, I had a look at Tytler et al. 1993. First of all, as best I can tell (not being an astronomer), it looks to me like a very nice paper. Great example of learning from data by identifying, quantifying, and removing different sources of bias and error.
But I don’t think I’d count it as using randomization-based “null” models in the sense defined in the post. As I understand it, their use of randomization tests is close to the traditional statistical one. Here, they want to figure out what their data would’ve looked like if z.abs values were uncorrelated with QSO sky positions, so as to be able to infer if the observed correlation in their sample of data is a reliable indicator of a true correlation. But they don’t know the data-generating function or a reasonable approximation thereunto, so instead they randomly shuffle the observed z.abs values. In a nice touch, they also justify their chosen procedure by showing that other plausible-seeming procedures mess up other features of the data. This is analogous to the work Gotelli and Graves did in ecology in the mid-90s, testing the ability of different randomization procedures and test statistics for species x site matrices to remove known non-random patterns in those matrices.
But in the sorts of cases I’m criticizing in this post, it’s not the non-random patterns themselves that are of interest. Rather, those patterns are only of interest because they’re thought to be diagnostic symptoms of some specific underlying ecological process such as interspecific competition. Let me try to make an astronomical analogy. Imagine a verbal theory that said that an association between z.abs values and QSO positions studied by Tytler et al. would arise from a previously-unknown physical force. Imagine trying to test that theory by going out and using a randomization test to check for an association between z.abs values and QSO positions. If you found a significant association, would you report the discovery of a new physical force? Conversely, if you found no significant association, would you infer that the hypothesized physical force doesn’t exist or is too weak to detect? No, because you don’t yet have a good theoretical argument that this force would in fact give rise an association between z.abs values and QSO positions, and because there might be other factors besides this previously-unknown force that might generate an association between z.abs values and QSO positions.
Hi Jeremy, “network science” is a case where a null model is used to quantify expectation with respect to modularity. Newman’s (2006 PNAS) module definition describes a module as a subcommunity with more links within itself than to other communities. (Note that “community” is the technical term used to describe a block in a matrix, nothing to do with ecological communities.) In the specific binary adjacency matrix case, the null model is equivalent to the fixed-fixed column/row null model known from biogeography. The quantitative version, e.g. in Barber (2007 Phy Rev E), is equivalent to the Patefield algorithm (r2dtable, in R) behind Fisher’s signed rank test.
Pingback: What’s wrong with null models? | theoretical ecology