Rather than giving advice, this post is me asking for advice. I am chairing a committee to completely revamp my department’s introductory biostatistics course. I would appreciate suggestions on what textbook to use for the revamped course. There are of course many introductory stats textbooks out there. But none of the ones with which I’m familiar seem suitable to me, hence my request for suggestions, which I will bring to the committee.
Before you start chiming in with suggestions, please read the rest of this post so that you understand the context and what sort of textbook we might need.
This is a largish lecture course (100+ students each term), aimed at first- and second-year biology undergraduates at a large public university, with the wide range of abilities and pre-existing interest in the material that that implies. One issue the committee will need to consider is how ambitious we can be in terms of the depth of coverage of fundamental concepts, and the range of specific techniques, to include in the revamped course. There are weekly computer-based labs, which use R.
Only a minority of the students are majoring or plan to major in ecology, environmental science, or a related field; the majority will go on to focus on sub-organismal fields of biology. Historically, the course has been taught by ecologists and has used exclusively ecological examples; that’s one aspect of the course that probably needs to change.* So a textbook aimed specifically at ecology students wouldn’t work.
Currently, the content of the course is what I think of as traditional, classical biostatistics. I believe similar courses are taught at many universities. The course focuses on classical frequentist methods of null-hypothesis testing: t-tests, single-factor ANOVA, linear regression, correlation, and chi-squared tests. These various methods are basically presented as distinct and applicable in different situations, rather than as different manifestations of underlying general principles of estimation and testing. Students do learn how these tests are calculated; the tests aren’t treated just as “black boxes” with inputs and outputs. And the course also covers the usual related material like how to test compliance with assumptions of normality and homoscedasticity, transformations to normalize residuals, classical nonparametric alternatives like Spearman rank correlation and Kruskal-Wallis one-way ANOVA, etc. There’s an emphasis on P-values, not much on R² values, effect sizes, confidence intervals, statistical vs. biological significance, etc.
Here are some of my thoughts on how I personally would like to see the course change. I emphasize that the committee hasn’t even met yet and so I’m only speaking for myself here. My hope is that someone can suggest a textbook that would fit this vision.
- I’d like to see general concepts like maximum likelihood estimation and the notion of a likelihood ratio test foregrounded, but in a non-technical way. This course obviously can’t become a calculus-based mathematical statistics class! But I do think it is possible to foreground the general concepts. What you want to get across is that all statistics is about two, complementary tasks. One is making our best guess about how the world actually is, based on the data we have. The other is evaluating how good those best guesses are (both in an absolute sense, and compared to other less-good-but-still-possible guesses), in light of the fact that our data are incomplete and if we did the same study again we might well get different data. In an upper-level quantitative methods class I teach to ecology and environmental science students, I cover ideas like likelihood in an informal way, as fundamental principles that unify all the specific statistical techniques that the students have been taught previously. I do this in part by using analogies to the sort of informal reasoning that people do all the time in everyday life. So foregrounding these fundamental ideas could just be a matter of rearranging the order of material that some of our students already get. I think the single most important thing students will come away with from this course is a really solid, gut-level feel for how to think statistically. Obviously, there is the challenge of making abstract concepts like likelihood seem concrete, relevant, interesting, and understandable to first- and second-year biology undergraduates. I think that challenge can be met. But I could be wrong; the counterargument would be that that stuff is too advanced for an introductory course, which necessarily has to take a bit of a less-unified, “here’s your menu of options” approach to statistical methods.
- I’d like to see more coverage of exploratory data analysis, graphing your data and getting a feel for it before you dive into running statistical tests.
- The course still needs to have (and perhaps even needs to strengthen) its emphasis on what experiments are, how they’re designed, etc. Only a minority of students seem to come in with a good sense of what a manipulative, controlled experiment is and how it’s different than just going out and measuring things. It’s very important that students come out of this course with a rock-solid ability to think experimentally (e.g., they should be comfortable answering questions like “Design an experiment or series of experiments to test scientific hypothesis X”)
- I’d like to see a more modern approach to distributional assumptions, rather than the current heavy emphasis on starting from an assumption of normality and then adjusting if it’s not met. I’d like the students to know something about why we might expect certain data to be distributed in a certain way (e.g., count data, proportion data, etc.), and then for that material to inform their approach to statistical testing. It’s been suggested that we just skip t-tests, ANOVA, regression, and even GLMs, and go straight to generalized linear models, but I wonder if that’s too radical or advanced a move to make.
- I’d like to see confidence intervals, effect sizes, statistical vs. biological significance, etc. taught in addition to (but not instead of!) P-values.
- I suspect that also trying to cover Bayesian methods is likely to be too ambitious given the level of the students and the already-ambitious amount of other things I’d like to see covered. But I’m open to being convinced otherwise (e.g., if you teach or know of a similar course, aimed at similar students, that covers both Bayesian and frequentist techniques).
- As noted above, the course needs to incorporate a wide range of biological examples, not just ecological ones.
So, if you can suggest any textbooks that would fit the bill, I’d love to hear from you!
*The course actually isn’t required of many of the students who take it. It’s only required of ecology, environmental science, and zoology majors, and any biology majors who choose to focus their studies on ecology and evolution. It’s a bit unclear why many students who don’t have to take it are taking it. But obviously we can’t ignore them when revamping the course.