The starting point for this post is an old remark of statistician Jeff Leek (sorry, can’t find the link just now) that no statistical technique works at scale. He was defending frequentist statistical techniques like P-values and confidence intervals against the accusation that they’re widely misunderstood or misused, and we should therefore use Bayesian approaches instead. Jeff’s counter-argument is that if Bayesian approaches were used as widely as P-values and confidence intervals currently are, they’d be just as widely misunderstood and misused.
Does that argument generalize? Is it true that any statistical or other scientific technique gets used increasingly badly on average as the number of people using it rises? Or are there some for which the quality of the average application holds steady or even improves as the number of users increases? And are there some techniques for which the quality of the average application declines only slowly or asymptotically as the number of users increases, vs. other techniques for which the decline is much steeper?
I was also wondering how the relationship between average quality of application and number of users affects the number of users. Are there some techniques that are hard to use well, but because they’re hard to use well only end up getting used by a small number of people who use them well? Versus other techniques that are hard to use well but easy to think you’re using well, that end up getting used badly by lots of people? Or maybe the number of people using any given technique mostly depends on other factors?
Off the top of my head, I can think of some techniques that do work just as well “at scale” as they do for “early adopters”. Pipetting for instance. The vast majority of the time, when somebody pipets some liquid, they pipet the desired amount. And I doubt the pipetting error rate has increased appreciably over time as more and more scientists and trainees have been doing more and more pipetting. Same for weighing stuff using balances. Etc. What those examples of “scalable” techniques have in common is that they’re routine. The user doesn’t need to exercise any thought, interpretation, or judgment. So one working hypothesis is that the more thought and judgement a technique requires in order to work well, the worse it will scale to mass use.
If that working hypothesis is right, then the scalability of a technique won’t be solely a matter of how difficult it is to teach or learn the technique in any purely technical sense. For instance, my sense is that widespread abuse and misinterpretation of P-values is mostly not a matter of “purely” technical mistakes. It’s not that lots of people are miscalculating their P-values or don’t know what a P-value literally means. Brian made a similar point in his old post on why AIC appeals to ecologists’ lowest instincts. Unhelpful applications of AIC in ecology mostly aren’t a matter of people making purely technical mistakes in the calculation of AIC values, or being unaware of technical facts about AIC.
I emphasize that my interest in these questions is purely academic curiosity. I do not think that our choice of statistical or other scientific techniques should be dictated by worries about how well they scale up to mass use. And nothing in this post is a criticism of how people use or teach statistics or other scientific techniques. All we can do is use whatever techniques seem best, and teach others to do the same. Perhaps the only practical reason to discuss the issues raised in this post is to identify the “failure modes” of different techniques–the ways in which they tend to be misunderstood or misused, when they are misunderstood or misused. If you know the most common misunderstandings or abuses of a technique, you can aim to try to avoid or counter them in your own work and teaching.
Related old posts
Which big ideas in ecology were successful, and which were unsuccessful? The same questions this post asks about statistical techniques can also be asked about scientific ideas. Big ideas in ecology vary in how successful they’ve been. In at least some cases you can argue that the success, or comparative lack thereof, is related to how widely the idea was taken up.