Statistician Cosma Shalizi is a really sharp guy, who writes wonderfully on everything from technical statistical topics to philosophy of statistics to books about the Soviet Union. The notes for his undergraduate stats courses are online, and they basically comprise a very readable textbook. As part of my prep for teaching intro biostats next term, I was skimming Cosma’s notes, and came across the chapter on linear regression, in which he asks whether R^2 “is a distraction or a nuisance”? Tell us what you really think, Cosma! 🙂
I’ll save you the trouble of clicking through; here are his reasons for dismissing R^2:
- It doesn’t measure goodness of fit. Even if your model is completely correct, R^2 can be made arbitrarily small by making the variance of X small. Conversely, R^2 can be arbitrarily close to 1 even when your model is wrong, as when the true model is nonlinear but the best linear approximation has a non-zero slope and the variance of X is large.
- It doesn’t measure prediction error. You can make R^2 take on any value just by changing the range of X. Mean squared error and out-of-sample error are much better measures of prediction error.
- R^2 doesn’t tell you how big your prediction intervals or confidence intervals are.
- R^2 can’t be compared across data sets.
- R^2 can’t be compared between models with transformed and untransformed Y, and can go down if you transform Y so as to better conform to model assumptions.
- The only situation in which you can compare R^2 values is when you’re fitting different models to the same data. But he says that in that case you might as well just compare mean squared error.
- R^2 is not “the fraction of variance explained” in any scientifically-meaningful sense, because it’s the same whether you regress Y on X or X on Y. (Elsewhere he suggests you think of regression as a smoothing method, and think of R^2 as the fraction of the variance in Y that’s “retained” by the predictions).
- R^2 is the square of the correlation coefficient. Which Shalizi, quoting Tukey, says is also always useless.*
In a footnote, he notes the recent work of Low-Decarie et al., compiling all reported R^2 values in ecology to ask if ecologists have gotten better over time at explaining the phenomena they study. He dismisses such exercises as “pointless”.
None of these technical points are new to me, and I doubt they’re new to anyone who’s had a basic stats course. For instance, when I teach intro biostats I teach that R^2 depends on the range or variance of X, that it depends on how you transformed your data, and that it’s the same whether you regress Y on X or vice-versa. But I don’t draw the conclusion that R^2 is useless. Why not?
Mulling it over, I think it’s for a few reasons:
- I think that points 3 and 5 are strawmen. I don’t see why you would want a measure of goodness of fit (or variance retained, or however you want to describe R^2) to be comparable across data transformations, or to be related to interval widths. So I don’t see why anyone should be bothered that R^2 doesn’t do those things.
- Point 8 puzzles me because I can think of various uses of correlation coefficients off the top of my head.** I can’t imagine that Tukey or Shalizi could possibly be forgetting these, though.
- Point 7 mostly seems like semantics to me. Yes, “explained” has causal connotations, that “retained” lacks. But I don’t think it’s that big a deal. Insofar as people misinterpret regressions as demonstrating causality, I don’t think it’s because of the words we use to summarize what R^2 means.
- Point 6 isn’t a criticism of R^2.
- I’m still mulling over 1-2. Is it really so bad to have to keep in mind that the ability of your regression model to explain (ok, “explain”) X-Y covariation depends on how much variation in X the model has to work with? I mean, surely the interpretation of mean squared error depends on context too.
- I’m still mulling over 4. This is Shalizi’s strongest argument, I think, especially in combination with 1-2. As I understand him, he’s saying that our usual informal, global understanding of what constitutes a “low” or “high” R^2 value is meaningless. For instance, this post of Brian’s, in which he says that models predicting metabolic rate as a function of body size are much better than models predicting abundance as a function of temperature, since the former have a “high” R^2 of ~0.9 while the latter have a “low” R^2 of ~0.2. As I understand him, Shalizi would say that’s a meaningless, apples-to-oranges comparison. Not only are they different datasets, but those datasets have different variables and presumably different underlying “true” models.
I always find it interesting when I disagree with people I usually agree with. And when two people I usually agree with (here, Brian and Cosma Shalizi) disagree with each other. It really makes me stop and think. So you tell me: is R^2 a zombie idea?***
*Shalizi says covariances are useful, but converting them to correlations is not.
**For instance, in principal components analysis and other dimension reduction techniques, when one has variables measured in different units, one ordinarily wants to to do the dimension reduction on the correlation matrix rather than on the covariance matrix. Cosma Shalizi himself does factor analysis on correlation matrices, so I assume he agrees with me that this is a good use of correlation.
***I think I can guess what Brian will say: that an imperfect or limited measure of goodness of fit (or however you want to describe R^2) is still better than none at all. And if you want to get ecologists to stop caring so much about p-values in favor of caring more about goodness of fit, your only hope is to talk them into using a familiar measure of goodness of fit: R^2.