Today’s post is unusual for me: it’s about a topic on which there’s often disagreement, but on which I’m not sure what to think myself. So I’m going to write about my uncertainty and hope that the resulting conversation will enlighten me.
The topic: what’s a “small” effect, and when are small effects worth caring about?
Possible definitions of a small effect:
- An effect that’s small in some absolute sense. For instance, a weight loss plan that, if followed, will only cause you to lose two pounds at most.
- An effect that’s small relative to other factors that affect the same variable. Though to make this sort of comparison you need to put variation in the factors affecting the variable of interest on a common scale.
- In the context of a difference between two means, a “small” difference might mean “small relative to the within-group standard deviation, say 0.2 standard deviations or less”. That’s Cohen’s d, an effect size measure popular in psychology. (Aside: I agree with the linked post that it’s best to treat arbitrary thresholds like “d<0.2 means it’s a small effect” as rough rules of thumb.)
- An effect that you can only detect with a massive sample size, and/or with a highly-controlled study design and statistical analysis that subtracts out or eliminates other effects that might swamp or confound the small effect.
- An effect that turns out to be smaller than we expected it to be based on previous theory and data
- Maybe others I haven’t thought of?
Possible reasons why small effects might be worth caring about (or not):
- It’s interesting/surprising/important that the effect exists at all, no matter how small it is. Perhaps because we have some theoretical reason to expect the effect to be literally zero. Or perhaps because we have some strong moral/policy/management reason to want to reduce the effect to literally zero.
- The small effect is a symptom of some important or interesting entity or state of affairs that can’t be directly observed. The Higgs boson is a good example. To detect its existence, you have to sift through trillions of “events” in your particle collider, checking to see if certain events are very slightly more frequent than expected under a model with no Higgs boson. Another example from physics is stellar parallax, a very subtle effect. Failure to detect it was once used an argument against heliocentrism. As a third example, relativity theory predicts time dilation. It’s important to know if relativity theory is right, so it’s worth flying airplanes carrying atomic clocks around the world to look for that very small effect.
- An effect that’s small in the sense of difficult to detect might be worth caring about because it characterizes a limiting case. The limiting case might well be difficult to detect, produce, or study–but if it doesn’t behave as we think it does, that would imply some serious problem with our understanding of other cases.
- Detecting a small effect might be impressive because of the effort, ingenuity, and/or technical cleverness required to detect it.
- You might need to quantify or detect a small effect so as to be able to subtract it out and thereby better estimate some other effect. Possibly some other small effect!
- Conversely, you often can argue that if an effect is so small as to require massive sample sizes, highly-controlled study designs, and sophisticated statistics to detect, it’s not big enough to be worth worrying about. For instance, here’s Andrew Gelman criticizing a psychology experiment for needing a sample size of 700,000 people to detect a change in the mean equal to 0.02 standard deviations.
- An effect that’s very small on average might not be worth studying if the sign and magnitude of the effect vary irregularly with all sorts of unknown and difficult-to-control factors. As I understand it, this is Andrew Gelman’s complaint about many social science studies, like this one of whether the outcomes of college football games affect how people vote in elections. So rather than focusing on whether the “true” mean effect is zero or not, you should focus on describing the variation. Because the “true” mean is just an epiphenomenon.
- If a variable is affected by many factors, all of which are of small effect, then you can argue that it’s not worth trying to identify or study those individual factors. Instead, we should just use a statistical distribution summarizing their collective effects. That’s the rationale for quantitative genetics–the phenotypic trait value represents the sum of the small effects of many genes and environmental factors, so you just assume a normal distribution of trait values.
- Maybe others I haven’t thought of?
It’s interesting to me that knowledgeable experts often disagree on what constitutes a “small” effect and whether any given “small” effect is worth studying. For instance, Andrew Hendry has a post criticizing several popular bandwagons in ecology and evolution for pairing “high enthusiasm and low R-squared”. I agree with him about some of them, but disagree on others. For instance, Andrew’s right that in BDEF experiments, variation in biodiversity often explains 20% or less of variation in ecosystem function. But by other measures of magnitude of effect, biodiversity has a substantial effect on ecosystem function (Hooper et al. 2012).
Heck, sometimes I disagree with myself. I thought it was really cool the Katie Hinde and colleagues used the lactation records of 1.5 million cows together with careful statistical analysis to show that, all else being equal, cows make 2.7% more milk for daughter calves than sons. 2.7% seems like a small effect to me, by multiple definitions. But it’s interesting that the effect was there at all, and that it wasn’t a bias towards sons (as the Trivers-Willard hypothesis would predict). Even if they’d found no effect in either direction, I’d have considered it a good study that was worth doing. As another example, I think it’s worthwhile to try to estimate density dependence as best we can from time series data, even though our estimates often aren’t precise enough to distinguish weak density dependence from density independence (Ziebarth et al. 2012, Knape and de Valpine 2012). As a third example, I think the greatest food chain dynamics experiment of all time is Kaunzinger and Morin 1998, which shows that, under sufficiently-controlled conditions, real food chains behave just like the Oksanen model predicts. I say it’s a great experiment even though–indeed, because–a less-controlled and more “realistic” experiment with multiple species per trophic level, omnivory, etc. would give very different results. It’s an example of my reason #3 for studying “small” effects. But on the other hand, I never understood the now-passed enthusiasm for fluctuating asymmetry, and took the difficulty of detecting it and testing hypotheses about it as a sign that it was a wild goose chase.
Mulling over my contrasting reactions to various studies of small effects, I find that I give a lot of weight to reasons 1-3 for studying small effects (really, they’re reasons to study any effect). With the caveat that studies of small effects place premia on accuracy and precision, and often require cautious interpretation so you don’t fall into the trap heliocentrists fell into. But I dunno–I find it difficult to systematize my reactions to different studies of small effects.
So you tell me: what’s a “small” effect, and when is a small effect worth studying?