Today’s post is unusual for me: it’s about a topic on which there’s often disagreement, but on which I’m not sure what to think myself. So I’m going to write about my uncertainty and hope that the resulting conversation will enlighten me.
The topic: what’s a “small” effect, and when are small effects worth caring about?
Possible definitions of a small effect:
- An effect that’s small in some absolute sense. For instance, a weight loss plan that, if followed, will only cause you to lose two pounds at most.
- An effect that’s small relative to other factors that affect the same variable. Though to make this sort of comparison you need to put variation in the factors affecting the variable of interest on a common scale.
- In the context of a difference between two means, a “small” difference might mean “small relative to the within-group standard deviation, say 0.2 standard deviations or less”. That’s Cohen’s d, an effect size measure popular in psychology. (Aside: I agree with the linked post that it’s best to treat arbitrary thresholds like “d<0.2 means it’s a small effect” as rough rules of thumb.)
- An effect that you can only detect with a massive sample size, and/or with a highly-controlled study design and statistical analysis that subtracts out or eliminates other effects that might swamp or confound the small effect.
- An effect that turns out to be smaller than we expected it to be based on previous theory and data
- Maybe others I haven’t thought of?
Possible reasons why small effects might be worth caring about (or not):
- It’s interesting/surprising/important that the effect exists at all, no matter how small it is. Perhaps because we have some theoretical reason to expect the effect to be literally zero. Or perhaps because we have some strong moral/policy/management reason to want to reduce the effect to literally zero.
- The small effect is a symptom of some important or interesting entity or state of affairs that can’t be directly observed. The Higgs boson is a good example. To detect its existence, you have to sift through trillions of “events” in your particle collider, checking to see if certain events are very slightly more frequent than expected under a model with no Higgs boson. Another example from physics is stellar parallax, a very subtle effect. Failure to detect it was once used an argument against heliocentrism. As a third example, relativity theory predicts time dilation. It’s important to know if relativity theory is right, so it’s worth flying airplanes carrying atomic clocks around the world to look for that very small effect.
- An effect that’s small in the sense of difficult to detect might be worth caring about because it characterizes a limiting case. The limiting case might well be difficult to detect, produce, or study–but if it doesn’t behave as we think it does, that would imply some serious problem with our understanding of other cases.
- Detecting a small effect might be impressive because of the effort, ingenuity, and/or technical cleverness required to detect it.
- You might need to quantify or detect a small effect so as to be able to subtract it out and thereby better estimate some other effect. Possibly some other small effect!
- Conversely, you often can argue that if an effect is so small as to require massive sample sizes, highly-controlled study designs, and sophisticated statistics to detect, it’s not big enough to be worth worrying about. For instance, here’s Andrew Gelman criticizing a psychology experiment for needing a sample size of 700,000 people to detect a change in the mean equal to 0.02 standard deviations.
- An effect that’s very small on average might not be worth studying if the sign and magnitude of the effect vary irregularly with all sorts of unknown and difficult-to-control factors. As I understand it, this is Andrew Gelman’s complaint about many social science studies, like this one of whether the outcomes of college football games affect how people vote in elections. So rather than focusing on whether the “true” mean effect is zero or not, you should focus on describing the variation. Because the “true” mean is just an epiphenomenon.
- If a variable is affected by many factors, all of which are of small effect, then you can argue that it’s not worth trying to identify or study those individual factors. Instead, we should just use a statistical distribution summarizing their collective effects. That’s the rationale for quantitative genetics–the phenotypic trait value represents the sum of the small effects of many genes and environmental factors, so you just assume a normal distribution of trait values.
- Maybe others I haven’t thought of?
It’s interesting to me that knowledgeable experts often disagree on what constitutes a “small” effect and whether any given “small” effect is worth studying. For instance, Andrew Hendry has a post criticizing several popular bandwagons in ecology and evolution for pairing “high enthusiasm and low R-squared”. I agree with him about some of them, but disagree on others. For instance, Andrew’s right that in BDEF experiments, variation in biodiversity often explains 20% or less of variation in ecosystem function. But by other measures of magnitude of effect, biodiversity has a substantial effect on ecosystem function (Hooper et al. 2012).
Heck, sometimes I disagree with myself. I thought it was really cool the Katie Hinde and colleagues used the lactation records of 1.5 million cows together with careful statistical analysis to show that, all else being equal, cows make 2.7% more milk for daughter calves than sons. 2.7% seems like a small effect to me, by multiple definitions. But it’s interesting that the effect was there at all, and that it wasn’t a bias towards sons (as the Trivers-Willard hypothesis would predict). Even if they’d found no effect in either direction, I’d have considered it a good study that was worth doing. As another example, I think it’s worthwhile to try to estimate density dependence as best we can from time series data, even though our estimates often aren’t precise enough to distinguish weak density dependence from density independence (Ziebarth et al. 2012, Knape and de Valpine 2012). As a third example, I think the greatest food chain dynamics experiment of all time is Kaunzinger and Morin 1998, which shows that, under sufficiently-controlled conditions, real food chains behave just like the Oksanen model predicts. I say it’s a great experiment even though–indeed, because–a less-controlled and more “realistic” experiment with multiple species per trophic level, omnivory, etc. would give very different results. It’s an example of my reason #3 for studying “small” effects. But on the other hand, I never understood the now-passed enthusiasm for fluctuating asymmetry, and took the difficulty of detecting it and testing hypotheses about it as a sign that it was a wild goose chase.
Mulling over my contrasting reactions to various studies of small effects, I find that I give a lot of weight to reasons 1-3 for studying small effects (really, they’re reasons to study any effect). With the caveat that studies of small effects place premia on accuracy and precision, and often require cautious interpretation so you don’t fall into the trap heliocentrists fell into. But I dunno–I find it difficult to systematize my reactions to different studies of small effects.
So you tell me: what’s a “small” effect, and when is a small effect worth studying?
not sure if your #1 reason to care about small effects includes this:
1) small effects have cumulative consequences. So a small selection coefficient can drive the evolution of a trait a long way given enough generations. This is effectively why so many classic trends in paleo (e.g. height of horse teeth) cannot be distinguished from a pure drift model.
2) small effects have big consequences if the sample is big (a small negative side effect of a drug may kill many people if 1 billion people are on the drug)
I wasn’t thinking of either of those when I wrote #1, so I consider those additions to my list.
+1 on all these responses. Which is why the concept of small effects gives me nightmares. Can be important but largely/effectively impossible to measure accurately if what we want is a causal effect conditional on a multiple moderating factors.
I second both of those reasons and would generalize them to the “small effect multiplied by something big is worth knowing about.” Whether it’s a policy decision with millions of people or change in survival that is consistent over hundreds of generations, I think the reason for caring about it is the same.
@David Mellor:
I like that generalization. And note that it highlights the importance of doing something very difficult: precisely and accurately estimating small effects. Because if your estimate is a bit off, and then you multiply it by a big number, your answer will be *way* off.
As somebody who has a great deal of interest in applied ecology, small effects may be important because they are the only effects that we can actively change. If, say, 90% of mortality of a species is driven by random weather events, while 10% of mortality is influenced by various human encroachment factors, that 10% is the variation that we can directly control.
Of course, there are things that you can do to buffer that 90% mortality (building shelters, supplemental feeding, etc.), but the timing of the weather events are out of our control in a way that the 10% is not.
Another example of a small effect repeated many times to generate a large impact: grocery stores have narrow margins but turn over their inventory quickly to generate enough profit to stay in business.
Ha, kind of funny you could also think of reproductive success as analagous to profit margin – lots of offspring but few survivors is like the grocery store (low net margin / effect), few offspring but high success is like Maserratti – high net margin or large but uncommon / infreqent effect.
Which brings up the idea that a given effect might have both magnitude and frequency, and makes me wonder if the term “size” of the effect is too general…
Being a MSc student in a completely different field I appreciate the points you’ve made. A small effect size in the strength and conditioning might not be as important as in the medical field because of the reasons previously stated. However using reason #6; having a massive sample size within a very specific individual sport at an elite level might be the difference between an athlete winning or losing if they adapt the tested training intervention. Obviously there endless numbers of issues with that statement.
I just completed my final exam on Research Concepts which was a 100% of my grade, so please feel free to critique my response to your post.
Coach Heller, this is a good point and I give a hypothetical example in a paper that is currently in press at the Journal of Experimental Biology. I measured a correlation of -0.07 (this is a large sample so the error on this is very small) between 100 m and 1500 m events for NCAA decathletes. For kicks, we assume that the correlation exactly measures a common causal effect and…
“The consequences of the effect sizes of the WBC quality-free correlations can be explored by computing what-if scenarios. For example, if we could intervene and shift M-P trait values in the direction that would cause 1500 m speeds to increase by two standard deviations, equivalent to running 42.7 s faster (i.e. from middle of the pack among all NCAA decathletes to top 2.5%), 100 m times would slow by only 0.064 s. In the 2014 division I championship, this intervention would drop the 100 m placing by an average of 1.5 places. This effect is not trivial but is quite small given the huge intervention in 1500 m time.”
Another reason small effects might be important is scale. For example, say we have a population for which we are interested in the mortality rate. A certain variable across the whole population may have a small effect on the mortality rate of an average individual, but may in fact have a large effect on some phenotypes and very little to none on others. Therefore the effect appears small on one scale, but may actually be very important given another scale.
I’ll also add that this could happen temporally. A factor may have little effect averaged over a long time period, but may have a very large effect at certain times.
Pingback: Qual teste estatÃstico devo usar? | Blog da BC
Pingback: What is the typical effect size of an ecological study? | Dynamic Ecology
Pingback: What questions would be interesting to ask with a database of over 114,000 ecological effect sizes from 470 meta-analyses? | Dynamic Ecology