Statistical machismo?

Are ecologists excessively macho when it comes to statistical methods? I use the word macho in a purely gender neutral way – I use it to mean “posturing to show how tough one is and place oneself at the top of the hierarchy”.

In my experience ecologists have a long list of “must use” approaches to statistics that are more complicated than simpler methods but don’t necessarily change the outcome. To me this is a machismo attitude to statistics – “my paper is better because I used tougher statistics”. It has a Red-Queen dynamic – eventually what starts as a signal of being superior turns into something reviewers expect in every paper. But often times with a little thinking, there is really no reason this analysis is needed in a particular case (the reviewer who is requiring it is so far removed from the development of the approach that they have forgotten why it is really used). And even if the more complex approach might be relevant, it can be very costly to implement but often have very little impact on the final results. Thus what started out as statistical machismo turns into wasted time required by reviewers. Here are some of my hobby horses:

Bonferroni corrections – One should be careful of multiple comparisons and the possibility of increased Type I error. However, this is carried way overboard. First of all, one is usually told to use the Bonferroni method which is known to error way on the other side and be excessively conservative. Secondly it is used without thought about why it might be required. I recall a colleague who had measured about 35 floral traits in two populations. About 30 of them came back as significantly different. Reviewers told them to do a Bonferroni correction. To anybody who understands the biological question and the statistics, a Bonferoni correction will make no difference in the final answer (OK only 26 out of 35 traits will be significant after correction but are we now going to conclude the populations haven’t differentiated?). Now if only 2 or 4 out of 35 were significant at p<0.05, then some proper correction would absolutely be needed (but then the whole conclusion probably ought to change regardless of Bonferroni outcomes in this case). But when 30 out of 35 were significant we’re still going to waste time on Bonferroni correcionts?
Phylogenetic corrections – Anytime your data points represent different species (i.e. comparative analysis), you are now expected to do some variation of PIC (phylogenetically independent contrasts) or GLS regression. I know there are a handful of classic stories that reverse when PICs are used. But now people are expected to go out and generate a phylogeny before they can publish any comparative analysis. Even when we don’t have good phylogenies for many groups. And when the methods assume the phlogenies are error-free which they are not. And when the p-values are <0.0000001 and unlikely to change under realistic evolutionary patterns. I once was told I had to do a phylogenetic regression when my dependent variable was abundance of a species. Now of all traits that are not phylogenetically conserved, abundance is at the top of the list (there is published data supporting this), guaranteeing there could not be a phylogenetic signal in this regression. When I argued this, my protagonist eventually fell back on “well that’s how you do good science” to justify why I should still do it – there was no link back to the real issues.
Spatial regression – Increasingly reviewers are demanding some form of spatial regression if your data (specifically residuals) are spatially structured. It is true that treating points as independent when they are spatially autocorrelated can lead you to Type I error. But it usually doesn’t change your p value by orders of magnitude and in real world cases, many spatial regressions have hundreds of points and p-values with 5 or 6 leading zeros. They’re still going to be significant after doing spatial GLS. And, here is the important point – ignoring spatial autocorrelation does not bias your estimates of slope under normal circumstances (at worst it makes it less efficient) – so ignoring autocorrelation will not introduce error into studying the parameters of the regression. You can also use simple methods to adjust the degrees of freedom and hence p-value without performing spatial regression. Incidentally, I think the most interesting thing to do with spatial autocorrelation is to highlight it and study at as being informative, not to use statistical techniques that “correct it out” and let you ignore it – the same thing I would say about phylogenetic correlation. Note that all of these arguments apply to timeseries as well.
Detection error – I am running into this increasingly with use of the Breeding Bird Survey. Anytime you estimate abundance of a moving organism, you will sometimes miss a few. This is a source of measurement error for abundance estimates known as detection error. There are techniques to estimate detection error, but – and here’s the kicker – they effectively require repeated measures of essentially the same data point (i.e. same time, location and observer) or distance-based sampling where distance to each organism is recorded, or many covariates. This clearly cuts down the number of sites, species, times and other factors of interest you can sample and is thus very costly. And even if you’re willing to pay the cost, it’s not something you can do retroactively on a historical dataset like the Breeding Bird Survey. Detection error also requires unrealistic assumptions to calculate such as the assumption the population is closed (about like assuming a phylogeny has no errors). Now, if one wants to make strong claims about how an abundance has gone from low to zero, detection error is a real issue (see the debate on whether the Ivory-billed woodpecker is extinct). Detection error also can be critical if one wants to claim cryptic species X is rarer than loud, brilliantly colored species Y since the differential detectabilities biases the result. And detection error alone indubitably biases estimates of site occupancy downward (you can only fail to count individuals in detection error) but this assumes detection error is the only or dominant source of measurement error (e.g. might mistaken double counting of individuals accidentally cancel out detection error). But if one is looking at sweeping macroecological questions, primarily comparing within a species across space and or time, it is hard to spin a scenario where detection error is more than a lot of noise.
Bayesian methods – this one is a mixed bag (Jeremy has discussed his view on Bayesian approaches previously here and here). There have been real innovations in computational methods that are enabled by Bayesian approaches (e.g. hierarchical process models sensu Clark et al). Although even here, it is in most cases the Markov Chain Monte Carlo (MCMC) Method as a computational tool to numerically solve complex likelihood that is the real innovation – not Bayesian methods. (As an aside, to truly make something Bayesian sensu strictu in my mind you need to have informative priors which ecologists rarely do, but I know others enjoy the philosophical differences between credible intervals vs. confidence intervals, etc). Notwithstanding these benefits, I have reviewed papers where a Bayesian approach was used to do what was basically a two-sample t-test or a multivariate regression or even a linear hierarchical mixed model (OK the last is complicated but still not as complicated to most people as the Bayesian equivalent). Apparently I was supposed to be impressed at how much better the paper was because it was Bayesian. Nope. The best statistic is one that is as widely understood as possible and good enough for the question at hand.

These techniques all have the following features in common:

a) They are vastly more complex to apply than a well-known simple alternative

b) They are understood by a much narrower circle of readers – in my book intentionally narrowing your audience is a cardinal sin of scientific communication when done unnecessarily (but I secretly suspect this is the main reason many people do it – the fewer people who understand you, the more you can get away with …)

c) They may require additional data that is impossible or expensive to obtain (phylogenies, repeated observations for detection). Sometimes the data (e.g. phylogenies) or assumptions (closed populations of detection analysis) are error-riddled themselves but it is apparently okay to ignore this. They might also require new software and heavy computational power (i.e. Bayesian).

d) They reduce the power in a statistical sense, downgrading the p-value, thereby meaning on average we will need to collect a bit more data and simultaneously falsely paying homage to p-values instead of important things like variance explained and effect size and also erroneously prioritizing Type I over Type II error.

e) They have not in the grand sweep over many papers fundamentally changed our understanding of ecology in any field I can name (nor even changed the interpretation of most specific results in individual papers).

In short, our collective statistical machismo has caused us to require statistical approaches that are a drag on the field of ecology, allowing them to become firmly (or quickly becoming firmly) established as “must do” to publish let alone be considered high quality science. I don’t object to having these tools around for when we really need them or valid questions arise. But can we please stop reflexively and unthinkingly insisting that every paper that possibly could use these techniques use them? Especially, but not only, when we can tell in advance they will have no effect. They have real (sometimes insurmountable) costs to implement.

To make this constructive, here is what I would suggest:

For the issues that are obsessed with Type I error (Bonferroni, spatial, temporal and phylogenetic regressions), I would say: a) stop wasting our time when p=0.00001 – it ain’t going to become non-significant (or at a minimum the burden of proof is now on the reviewer to argue for some highly unusual pathology in the data that makes the correction matter way more than usual or else that estimation bias is being introduced). If p is closer* to 0.05 then, well, have a rational conversation about whether hypothesis testing at p<0.05 is really the main point of the paper and how hard it is to get the data to do the additional test vs the importance of the science, and be open to arguments of why the test isn’t needed (e.g. knowing that there is no phylogenetic signal in the variable being studied).
For detection error, apply some common sense about whether detection error is likely to change the result at hand or not. Some questions it is, some it isn’t. Don’t stop all science on datasets that don’t support estimation of detection error.
For Bayesian approaches – out of simple respect for your audience, don’t use Bayesian methods when a simpler approach works. And if you are in a complex approach that requires Bayesian calculations, be clear whether you are just using it as a calculation method on likelihoods or really invoking informative priors and the full Bayesian philosophy. And the burden is still to justify that you have answered an ecologically interesting question – including a Bayesian method doesn’t give you a free pass on this question.

To those readers who object to this as a way of returning common sense to statistics in ecology, I challenge the readers to make a case that these kinds of techniques have fundamentally improved our ecological understanding. I know this is a provocative claim, so don’t hold back. But please don’t: 1) Tell me you have to do the test “just because” or because “statisticians agree” (they don’t – most statisticians understand the strengths and weaknesses of these approaches way better than ecologists) or it violates the assumptions (most statistics reported violate some assumption – its just a question of whether it violates the assumptions in an important way); 2) nor assume that I am a statistical idiot and don’t understand the implications for Type I error, etc; and 3) Please do address my core issues of the real cost of implementing them and describe how they improve the state of ecological knowledge (not statistical assumption satisfying) enough to justify this cost. Otherwise, I claim you’re guilty of statistical machismo!

UPDATE – if you read the comments you’re in for a long read (109 and counting). If you want the quick version I posted a summary of the comments. At the moment it is the last post (until somebody comments on the original post again). If its not the last post or close to it, you can find by using your browser to search for “100+” which should take you straight to the summary.

*(I would propose an order of magnitude cut-off of only worrying about Type I error correction if p>0.005, and I think this is conservative based on how much I’ve seen these correction factors change p values)

Dynamic Ecology

Multa novit vulpes

Statistical machismo?

203 thoughts on “Statistical machismo?”

Leave a Comment Cancel reply

Share this:

203 thoughts on “Statistical machismo?”

Leave a Comment Cancel reply