Rather than giving advice, this post is me asking for advice. I am chairing a committee to completely revamp my department’s introductory biostatistics course. I would appreciate suggestions on what textbook to use for the revamped course. There are of course many introductory stats textbooks out there. But none of the ones with which I’m familiar seem suitable to me, hence my request for suggestions, which I will bring to the committee.

Before you start chiming in with suggestions, please read the rest of this post so that you understand the context and what sort of textbook we might need.

This is a largish lecture course (100+ students each term), aimed at first- and second-year biology undergraduates at a large public university, with the wide range of abilities and pre-existing interest in the material that that implies. One issue the committee will need to consider is how ambitious we can be in terms of the depth of coverage of fundamental concepts, and the range of specific techniques, to include in the revamped course. There are weekly computer-based labs, which use R.

Only a minority of the students are majoring or plan to major in ecology, environmental science, or a related field; the majority will go on to focus on sub-organismal fields of biology. Historically, the course has been taught by ecologists and has used exclusively ecological examples; that’s one aspect of the course that probably needs to change.* So a textbook aimed specifically at ecology students wouldn’t work.

Currently, the content of the course is what I think of as traditional, classical biostatistics. I believe similar courses are taught at many universities. The course focuses on classical frequentist methods of null-hypothesis testing: t-tests, single-factor ANOVA, linear regression, correlation, and chi-squared tests. These various methods are basically presented as distinct and applicable in different situations, rather than as different manifestations of underlying general principles of estimation and testing. Students do learn how these tests are calculated; the tests aren’t treated just as “black boxes” with inputs and outputs. And the course also covers the usual related material like how to test compliance with assumptions of normality and homoscedasticity, transformations to normalize residuals, classical nonparametric alternatives like Spearman rank correlation and Kruskal-Wallis one-way ANOVA, etc. There’s an emphasis on P-values, not much on R² values, effect sizes, confidence intervals, statistical vs. biological significance, etc.

Here are some of my thoughts on how I personally would like to see the course change. I emphasize that the committee hasn’t even met yet and so I’m only speaking for myself here. My hope is that someone can suggest a textbook that would fit this vision.

- I’d like to see general concepts like maximum likelihood estimation and the notion of a likelihood ratio test foregrounded, but in a non-technical way. This course obviously can’t become a calculus-based mathematical statistics class! But I do think it is possible to foreground the general concepts. What you want to get across is that all statistics is about two, complementary tasks. One is making our best guess about how the world actually is, based on the data we have. The other is evaluating how good those best guesses are (both in an absolute sense, and compared to other less-good-but-still-possible guesses), in light of the fact that our data are incomplete and if we did the same study again we might well get different data. In an upper-level quantitative methods class I teach to ecology and environmental science students, I cover ideas like likelihood in an informal way, as fundamental principles that unify all the specific statistical techniques that the students have been taught previously. I do this in part by using analogies to the sort of informal reasoning that people do all the time in everyday life. So foregrounding these fundamental ideas could just be a matter of rearranging the order of material that some of our students already get. I think the single most important thing students will come away with from this course is a really solid, gut-level feel for how to think statistically. Obviously, there is the challenge of making abstract concepts like likelihood seem concrete, relevant, interesting, and understandable to first- and second-year biology undergraduates. I think that challenge can be met. But I could be wrong; the counterargument would be that that stuff is too advanced for an introductory course, which necessarily has to take a bit of a less-unified, “here’s your menu of options” approach to statistical methods.
- I’d like to see more coverage of exploratory data analysis, graphing your data and getting a feel for it before you dive into running statistical tests.
- The course still needs to have (and perhaps even needs to strengthen) its emphasis on what experiments are, how they’re designed, etc. Only a minority of students seem to come in with a good sense of what a manipulative, controlled experiment is and how it’s different than just going out and measuring things. It’s very important that students come out of this course with a rock-solid ability to think experimentally (e.g., they should be comfortable answering questions like “Design an experiment or series of experiments to test scientific hypothesis X”)
- I’d like to see a more modern approach to distributional assumptions, rather than the current heavy emphasis on starting from an assumption of normality and then adjusting if it’s not met. I’d like the students to know something about why we might expect certain data to be distributed in a certain way (e.g., count data, proportion data, etc.), and then for that material to inform their approach to statistical testing. It’s been suggested that we just skip t-tests, ANOVA, regression, and even GLMs, and go straight to generalized linear models, but I wonder if that’s too radical or advanced a move to make.
- I’d like to see confidence intervals, effect sizes, statistical vs. biological significance, etc. taught in addition to (but
*not*instead of!) P-values. - I suspect that also trying to cover Bayesian methods is likely to be too ambitious given the level of the students and the already-ambitious amount of other things I’d like to see covered. But I’m open to being convinced otherwise (e.g., if you teach or know of a similar course, aimed at similar students, that covers both Bayesian and frequentist techniques).
- As noted above, the course needs to incorporate a wide range of biological examples, not just ecological ones.

So, if you can suggest any textbooks that would fit the bill, I’d love to hear from you!

*The course actually isn’t required of many of the students who take it. It’s only required of ecology, environmental science, and zoology majors, and any biology majors who choose to focus their studies on ecology and evolution. It’s a bit unclear why many students who don’t have to take it are taking it. But obviously we can’t ignore them when revamping the course.

Quinn and Keough all the way for me (http://www.amazon.co.uk/Experimental-Design-Data-Analysis-Biologists/dp/0521009766). It’s a well-written book that is great as an introduction to biological statistics. It’s great for troubleshooting data that break the assumptions of the common tests, i.e., just about all biological data. It’s still my go-to book.

Thanks, had a glance at it, looks promising, will have a closer look.

Quinn & Keough is great, I wish I had it as textbook when I was an undergrad. I think the coming 2nd edition will actually cover most of your wishes. Until then the first edition, I belive, is good enough. I had Underwood’s Experiments in Ecology in my undergrad stats-course, which is good but a bit heavy to read and almost entirely focused on ANOVA designs. I’m curretly a teaching assistant in biostatistics and we use Hawkins’ Biomeasurement. Had it been my choice, we would not this book. Although it is easy to read and cover most basic methods, it does not cover more than that, and it contains a few errors (e.g. in the generalised t-test formula). I think a good book should cover more than what is needed for a ground-level course in statistics.

I’d second Quinn & Keough, which is what I used in my graduate-level GLM-based course.

I’ve found that when introducing or explaining generalized linear models (GzLMs) to someone who started with t-test, ANOVA, regression, etc. that they didn’t know that these were all just special cases where GzLM(GLM(ANOVA(t-test))). The spark of amazement on their face is a great “educator moment” for me. If one breaks it down, it can really become a flow chart: 1. What error distribution? 2. How many response variables, and what kind are they (ratio, categorical, ordinal, etc)? 3. How many predictor variables, and what kind are they? etc., and you can even introduce mixed models. Using R really lends itself to this approach, I think.

My first stats course (in 2nd year undergrad) used “The Cartoon Guide to Statistics” which used the approach of treating each test as a separate entity, and it left me woefully unprepared.

I still love that book. It has the perfect balance of showing the basics of how the stats work (enabling informed choices) whilst still being a ‘cookbook’ for those that aren’t particularly mathematically minded. Best of all, it brings it home that statistical analysis doesn’t need to be hard.

Via Twitter, Freya Harrison says Ruxton & Colegrave’s “Experimental Design for the Life Sciences.” Says UG biology students get on well with it.

Modern Statistics for the Life Sciences by Alan Grafen & Rosie Hails

Of the three biostatistics classes I’ve taken at two different universities, only one has required a textbook. If you are confident enough in the clarity of your lectures, I’d suggest not even using a textbook. Students are going to google something when they don’t understand, anyway. Or see if your library offers a statistics textbook for free as an e-book and suggest that for students who do want a textbook.

I actually have operated on that theory in the past in my upper level quantitative methods class and my upper level population ecology class. In both, there are some relevant texts on reserve at the library for students who want them, but I tell them there’s no required text, they can succeed if they just come to lectures and pay attention. The students mostly don’t like it. So yeah, I think we’re going to need a textbook–the students will want us to have one.

I took a biostats course as a 2nd year undergrad that was based on Jerrold Zar’s ‘Biostatistical Analysis’. It is a nice thorough introduction to classical stats concepts and it is geared to a general biology audience (diverse examples in the text). It introduces distributions and then focuses heavily on hypothesis testing.

Hmm, Zar is probably a bit too old-school for what we’re looking for. But we’ll look into it.

Yes, that is the caveat that I forgot. Very old school… probably wouldn’t meet your hopes of some of the modern ideas and methods.

With the exception of generalized linear models. This post reads like a description of Whitlock and Schluter.

It’s funny, I glanced at Whitlock and Schluter a while back, and I was struck by how it suddenly jumped from quite a basic level to quite an advanced level a couple of chapters from the end. I think that put me off. But it’ll be easy enough to have another look, one of my colleagues here uses it in one of his courses.

I second Whitlock and Schluter. It was used for a biometry course I took in undergrad and has served me well into graduate school. Teaching Bayesian is likely a bit much. I’ve heard some folks say teaching R and biostats at the same time is almost too much, though I would disagree.

Depends what you mean by “teaching R”. We give the students R command cheat sheets and don’t routinely demand that they know things like all the options for a given R function.

That’s clever. Plus, how many of these students will actually be in positions where they will need to use R? The Whitlock and Schluter book does have some weird “stuck-in” chapters at the end. If I recall correctly, a chapter on meta-analysis?

Well, the students majoring in ecology and environmental science will need R for their upper level courses. So we do need to base the labs in this course on R. And we don’t want the students to approach R in a totally “black box” way, so we don’t tell them about the existence of the Rcmdr package. We’re not out to teach them heavy duty R programming, but we do want them to have some sense of “how the engine works” as well as “how to drive the car”.

We used Whitlock and Schluter together with a lab using R commander for a pilot biostats course aimed at your audience. I found it worked quite well. We didn’t get anywhere near the GLM part at the end, or the other advanced chapters that concerned you. I particularly liked the broad range of examples from many fields of biology. I needed to tweak some of the exercises so they could be done in R commander, but generally those parts of the book were great.

I really like “A Primer of Ecological Statistics” by Gotelli & Ellison. Its a very accessible text book that does a great job covering all the basic concepts in statistics and some of the more advanced ones including Bayesian Statistics. Maximum Likelihood, and multivariate statistics.

I really like how the book builds up a very easily understandable, but sophisticated base for the students in probability and experimental design before getting into the different statistical tests.

All the examples are ecological, but this is a really good textbook for students who don’t have a stats background. I think that a savvy instructor could easily develop some more “medical” or “molecular biology” type examples in lecture to supplement the book.

Its major weakness is that it does not have canned problem sets included for using R. But I’d still use this textbook to teach an introduction to stats course and then create the R exercises myself.

For an introductory course focused on the concepts and not the math, which aims at giving the students a unified overview, as you described, I would suggest “Statistics without Math” by Magnusson & Mourão. But I must confess that I also like the good, old Zar for its simplicity and objectiveness.

I have not actually used this book, but I am taking a very close look at Statistical Modeling: A Fresh Approach by Danny Kaplan from Macalester College. It is a general introductory statistical modeling text, not biostats per se, but it takes a very interesting approach and seems to fit many of the priorities you mention. Relatively cheap too.

http://www.mosaic-web.org/go/StatisticalModeling/

From my experience, I learned very little in my undergrad stats classes, partly because the text books we had were too advanced and dry for most of us to understand! I have since discovered these 2 books, which have helped me immensely:

– ‘Biomeasurement’, by Dawn Hawkins

– ‘Choosing and using statistics: a biologist’s guide’, by Calvin Dytham

There’s also a great online text by John McDonald (http://udel.edu/~mcdonald/statintro.html) that covers the basics of biostats.

I like both Quinn & Keough and Whitlock & Schluter. Either is really just a reference to the lectures. Zar is old school. I learned from Sokal & Rohlf, which has lots of strengths but also some old school weaknesses. Gotelli & Ellison is missing many many elements or includes them only superficially. I don’t get at all why so many ecologists gave it rave reviews. I’d say your list is ambitious. I personally think undergrads (and grads and many faculty) have a very superficial understanding of many elementary concepts (like sample standard deviation v. standard error of the mean, P values, confidence intervals, etc. etc.) because superficial breadth is substituted for depth and understanding. Exploratory data analysis is an entire course. Understanding statistics and using something like R are two different courses. Maximum likelihood is its own course. Introducing all this is likely going to sacrifice time that could be spent hammering in the fundamentals, which, like classic xc skiing – is superficially easy but takes lots of work to really master. I use R extensively in class but more to check intuition or understanding concepts. (If I were to resample the 10, 100, 1000 individuals, how would the sample standard deviation expect to change? what about the standard error of the mean?) Were I to teach a bigger undergrad class, I would have students purchase a 1 year student license to JMP, which does the exploratory stats and the easy stuff very well and very easily. There is zero learning curve. My 2 cents! Good luck – write a follow up post!

I took an introductory biostats course at Princeton last spring that was very similar to the vision you describe for your course. We used Whitlock & Schluter and I loved it – it was one of the most straightforward and engaging textbooks I’ve ever used. Especially if you’re interested in 1) presenting relevant and interesting examples from many fields of biology, and 2) building a strong conceptual understanding of statistics, I highly recommend it.

I have to add my vote for Goitelli and Ellison. Obviously I’m biased though 🙂

Ive heard good things about Crawley’s Statistics: an introduction using R. Having never used it myself, Im not sure if it covers all of your needs though.

I do not have a favourite book but I find Quinn & Keough useful. A new book by Beckerman and Petchey (http://www.amazon.co.uk/Getting-Started-R-introduction-biologists/dp/0199601623) could also be worth considering, especially as an introduction to data exploration in R.

Yes, Owen Petchey and Andrew Beckerman are friends of mine, their book is a very good starting point for complete R neophytes, I’ve reviewed it for Dynamic Ecology in the past.

Currently, we give our students ‘cheat sheets’ for R. We’re not really out to teach them R programming. We want to teach them stats; R is just a means to that end.

While I really like Quinn & Keough’s book, I’m not sure that as a student I would gain much from owning the book if the lectures are already based on it. It doesn’t have additional exercises, although the example data sets are in general well chosen to drive the key points. I think it is also important to choose a book that isn’t an R book, but that emphasizes concepts; today I’m using R but before I used SAS and Statistica and tomorrow who knows.

Absolutely re: concepts over programming. R is purely a means to an end for us. There’s no way we’re going to go with a text that’s basically an R book, but with a bit of background on the conceptual side thrown in.

I would strongly endorse using Whitlock and Schluter. I used this text as an undergrad a 3rd year biology department stats/study design class and then and as a TA for a similar biostats course at a different institution. I find it so easy to read – it is really approachable to students ‘scared’ of stats, as most are. This book is written for biology undergrads. The chapters are relevant, easy to get through, and fun. In my mind this book comes pretty close to a perfect intro stats book for biologists. There is also supporting online resources at http://www.zoology.ubc.ca/~whitlock/ABD/teaching/index.html

Depending on how the topics you cover in the course shake out, you may need to teach a few lectures not covered in Whitlock and Schluter.

I am now using Quinn and Keough, for my 2nd grad level stats class (which also suggests Whitlock and Schluter, and I am at yet another university, 3rd one now that uses Whitlock and Schluter). I also think highly of this book, but really think it would have been far to heavy for an undergraduate level stats/study design class for biologists.

Thanks clarrien, good to have a perspective from someone who’s used one of these texts as an undergrad.

Undergrad here. Quinn & Keough was the text for a class I took last fall that sounds somewhat similar to what you have in mind, though mostly aimed at incoming grad students with limited stats backgrounds. I would actually disagree with clarrien, I found the book easy to work with despite having forgotten most of my first-year math classes the morning after exams. In particular I like that it’s extremely well-referenced, so on the few occasions where I found it vague I had no trouble finding other texts to compare. The “issues and hints for analysis” notes at the end of each chapter are fantastic and have spared me many hours of frustration while analysing data as an RA.

Gotelli & Ellison is quite approachable and was my go-to reference in first- and second-year whenever I was unfamiliar with an ecology paper’s analysis, but I’d second Jeff Walker — it really is missing a lot, and I’m not sure it would have been that helpful as a course textbook.

Again, good to have an undergrad perspective! Although I suspect that the undergrads who read this blog are a highly non-random sample of all biology undergrads… 😉

For those of you who like Quinn and Keough, but want some help with its implementation, Murray Logan taught a 3rd-year biostatistics course at Monash University (Melbourne, Australia) based on Quinn and Keough for several years. (He took the class over from Gerry Quinn, after he moved to a different university.) He’s recently published Biostatistical Design and Analysis Using R: A Practical Guide that was based on the teaching materials he developed for this class. It covers most of what you’re after, Jeremy, and provides R code for analysis of many of the Q & K datasets. Having been road-tested on 5 years x 125 students, it is very well-suited to teaching.

Thanks Patrick, that’s very helpful!

A colleague who wishes to remain anonymous emailed to say that she teaches a stats course much like I describe, minus the likelihood analyses and any mention of Bayesianism. She goes on to say:

“For that class, I am a big fan of many of the textbooks authored by David S. Moore, who led the charge to a more ‘modern’ teaching of statistics in the past couple of decades. The emphasis is on GRAPHING being analyzing with tests, checking assumptions before looking at P values, and understanding the underlying concepts of each test if not the full mathematics. Most of his books are now packaged with StatsPortal, which includes videos that can replace some of the lecture material in order to free up class time to work through problems in small groups.

Brigitte Baldi adopted his generalist text to biology about 3-4 years ago, resulting in the textbook called the Practice of Statistics in the Life Sciences. In its online incarnation, there is some pretty detailed info on R programming to accompany each chapter that I find to be pretty good as a non-fluent R user.”

Pingback: On the tone and content of this blog (feedback encouraged) | Dynamic Ecology

Pingback: Stuart Hurlbert rips Sokal & Rohlf and the state of biostatistical training | Dynamic Ecology

Pingback: Bayesian Biostatistik (Statistics in Practice) | INFOWEBLOG.NET

Hi Jeremy, I realize this post is old by now, but I’m curious what text book you ended up choosing and how it ended up working for your course. I’m currently developing a biostats course and would love any input you may have. Many thanks!

We went with Whitlock & Schluter: https://dynamicecology.wordpress.com/2014/11/04/anybody-out-there-teaching-a-successful-intro-biostats-course-tell-us-about-it/

Jeremy, I just revisited this blog as I think about my MS-level biostats course offered in spring semesters. The needs of first-year masters students are different from undergraduate students in that the need for the ability summarize and interpret data, then present that summary are much more immediate — in their second semester, most are actively working on their thesis proposals or already in the process of collecting data. I’ve been using Whitlock & Schluter for the last couple of years, but I’m seriously considering going back to Quinn and Keough. I love the way Whitlock and Schluter cover the basics, but real-life research for biologists, particularly ecologists, typically begins with 1-way ANOVA, not ends there. I’ve often joked that the reason there are so many biostat textbooks is the general dissatisfaction each of us who try to teach it feel with existing texts.

For R, I’ve been using Deducer/JGR. The combo of drop-down menus and the nice editor in JGR helps students migrate from point and click to programming, plus drop-down menus really help with learning ggplot. It does come with Java baggage and we spend a 2-hour install-fest the first week of class getting it working properly on my 10 or 15 MS student’s laptops. Getting it working on 130 or 150 student’s computers would be daunting. Rstudio is a lovely environment, and much easier to install, but of course doesn’t play nice with Deducer — some day hopefully.

Hmm. Whitlock and Schluter does jump to more advanced topics at the end of the book. I take it you don’t like those bits?

Just curious for your view. I agree with you that Whitlock and Schluter is too basic for grad students or advanced undergrads who’ve already had a biostats course. Or even for a first biostats course for advanced students capable of moving through the material quickly.

You picked up on the crux of the problem in your last sentence. Because of the small class size, I can usually move pretty quickly through the basic material and get to factorial treatment designs and RCB experimental designs by mid-term. We then move on to talking about using generalized linear models for mixed models, how to deal with repeated measures because they frequently run into the issue in their thesis research, and depending on the class, to ordination topics (I often have plant community ecology grad students in the class). So, while it’s a bit older, Quinn and Keogh (2002) fit the topics I cover a little better.