In a recent issue of Limnology and Oceanography Bulletin, Stuart Hurlbert reviews (UPDATE: link fixed, venue corrected) the new (4th, 2012) edition of Sokal & Rohlf’s classic biostatistical text, Biometry (HT Carl Boettiger). The first sentence of the review gives you the flavor:
Reader be forewarned: were it allowed the title of this review would be “A readable but overblown, incomplete and error-ridden cookbook”.
Tell us how you really feel, Stuart! And to think that sometimes I worry if I’m too tough on other people’s work…
You should click through and read the whole thing. But if you’re not so inclined, here’s a brief summary of Hurlbert’s beefs with Sokal & Rohlf (the book, not the people; I’ll refer to the book as Sokal & Rohlf because that’s what everyone does). Hurlbert says his beefs apply to all editions, not just the most recent one:
- No coverage of experimental design, or sampling design of observational studies. Relatedly, and worse, incorrect or confusing implications about experimental design and sampling design. For instance, there are no formal definitions of key terms like “experiment”, “experimental unit”, “block”, “repeated measures”, etc. Worse, observational studies often are described using experimental terms like “treatment”, “control”, and “randomized block design”. This leads to serious confusion, even about matters as basic as what an experiment is.
- Too much emphasis on “statistical gimmickry” of little or no practical use, such as standardized effect sizes.
- Superficial, cookbook-type treatment of many procedures, with no conceptual framework for understanding why one might want to use those procedures.
- Incorrect, incomplete, and confusing coverage of other matters, from when it’s appropriate to use a one-tailed test, to whether to correct for multiple comparisons (Hurlbert apparently believes you should never do so, and so slams Sokal & Rohlf for insisting on this), and many more.
- Rigid adherence to Neyman-Pearson null hypothesis testing, at the expense of estimation and more refined, quantitative assessment of the evidence for or against any given hypothesis.*
The only value Hurlbert sees in Sokal & Rohlf is as a reference manual for the “recipes” for how to calculate various statistical procedures. He concludes by blaming the popularity of Sokal & Rohlf for what he sees as decades of poor statistical practice in biology. He also laments that no current biostatistical textbook teaches an appropriately-modern philosophy of statistics, in a clear way with a focus on principles, with no errors.
What do you think of all this? I have to say I found it kind of surprising, but not because I revere Sokal & Rohlf. I’ve mostly used it as a reference manual myself. I’d certainly never try to teach from it at any level, if for no other reason than it’s way too voluminous. I guess I always assumed, without really thinking about it, that it was always intended, and mostly used, as a reference manual. Was I wrong to assume that? And while I find Sokal & Rohlf old-fashioned in some ways (e.g., randomization, bootstrapping, and generalized linear models render classical non-parametric tests and data transformations largely irrelevant), that never really bothered me. The first edition came out in 1969; of course it’s going to be old-fashioned. And I don’t know that it’s fair to pick on Sokal & Rohlf and blame it for the purportedly terrible statistical practices of modern biologists, even though the book certainly is popular. Insofar as our statistical practices are terrible (and I don’t know if they are or not), there’s surely plenty of blame to go ’round. And can’t you also give Sokal & Rohlf credit for helping to encourage more biologists to use statistics in the first place? But I’ve never really thought about Sokal & Rohlf all that much, and I actually haven’t cracked it open in years, so I’m sort of a curious bystander here.
As an aside, I found it interesting that such vociferous criticism of Sokal & Rohlf came from someone from basically the same school of statistical thought. Hurlbert isn’t a Bayesian of any stripe, nor is he advocating for computationally-intensive methods, for instance. His criticisms of Sokal & Rohlf mostly aren’t criticisms of what the book sets out to do, they’re mostly criticisms of the book’s execution.
What do you think? Does Sokal & Rohlf deserve the criticism Hurlbert heaps on it? More broadly, what do you see as the biggest problems with how modern biologists teach and use statistics? And what textbook(s) should we be using in our courses in order to fix those problems? (Again, Hurlbert says there’s no biostatistics textbook that’s readable, strong on general principles, and error-free!)
My interest in this isn’t purely academic. I’m not just looking to grab some popcorn and watch proponents and detractors of Sokal & Rohlf argue. 😉 As I noted in a previous post, this fall I’m taking over teaching the introductory undergrad biostats course in my department. So for the first time, I need to think seriously and in great detail about exactly what introductory biostatistical material to teach and how to teach it. I’ve settled on a textbook (Whitlock & Schluter), and I have a tentative list of lectures and the major changes I want to make to the existing labs. But nothing beyond that. And even getting that far has required a lot of thought, in particular about precisely the issues Hurlbert raises. How much emphasis to place on general, unifying principles vs. coverage of specific tests. How much emphasis to place on black-and-white rules of good statistical practice vs. equipping students to make informed judgment calls. Etc.
It occurs to me that teaching biostatistics is something like teaching children good behavior. You start out by teaching kids black-and-white rules, like “don’t lie” and “don’t hit your sister.” And it’s only later that kids learn that good behavior often isn’t black-and-white. Sometimes it’s not only ok to lie (or to hit your sister!), it’s positively a good idea, morally. Heck, there are lots of tricky moral situations that you aren’t even taught about at all until you’re older. And that’s without even getting into competing, mutually-incompatible philosophies as to what good behavior consists of, and what makes it good! So you tell me–what should we be teaching our “kids” about biostatistics if we want to start them down the road towards responsible “adulthood”? (“Don’t
hit your sister fail to correct for multiple comparisons!”)
*Hurlbert actually thinks Sokal & Rohlf should’ve based their book on what Hurlbert calls the “neoFisherian” approach. I confess I’d never heard the term “neoFisherian”, which is Hurlbert’s own recent coinage. Hurlbert has a 2009 paper if you want to find out what he means by “neoFisherian” and why he thinks Neyman-Pearson hypothesis testing is so outdated that it should no longer be taught (UPDATE: link fixed). As far as I can tell, what Hurlbert means by “neoFisherian” doesn’t sound too far from Deborah Mayo’s notion of “error statistics” (which itself is actually not all that far from Neyman-Pearson, or even from some forms of Bayesianism). But it’s a little hard to tell because much of Hurlbert’s paper focuses on what seem to me to be rather nit-picky details of current practice (like conventions for reporting P values). Anyway, I think it would’ve been helpful for Hurlbert to briefly elaborate his own philosophy in his review, rather than just refer to it using a term of his own recent coinage.