Ask us anything: resources to teach yourself stats and mathematical modeling

A little while back we invited you to ask us anything. Here are the next questions, and our answers. Questions have been edited for clarity and brevity; see the comments on the linked post for the original questions.

What text/online resources do you recommend for self-teaching intro stats (up through ANOVA)? (from Peter, @onepintmore)

Jeremy: I teach intro biostats from Whitlock & Schluter. I like it a lot, and its flaws are shared by every intro biostats textbook of which I’m aware. For historical reasons, it has separate chapters on t-tests, regression, ANOVA, etc. I think it would be a big improvement, conceptually and pedagogically, to just teach general linear models from the start, noting in passing that for historical reasons, various special cases have their own names. But as far as I know, there’s no GLM-based biostats text written at the low level of Whitlock & Schluter, and with Whitlock & Schluter’s other virtues (e.g., tons of concrete examples and practice problems).

The free online resources of which I’m aware aren’t nearly as good. Sorry. For instance, many of the most popular YouTube videos on basic stats concepts contain serious mistakes. And based on an admittedly small and old sample, I find Wikipedia’s stats pages are very hit and miss (the one time I tried to edit Wikipedia, many years ago, it was to correct howlers in the page on the Mann-Whitney U test, a nonparametric test sometimes taught in intro biostats courses). I know of good free resources on specific topics, but nothing comprehensive. But it’s possible there are good free comprehensive resources out there that I’m not aware of.

Statistics Done Wrong is a good free website (and now a book, I believe) on common statistical mistakes, including some important ones that aren’t widely covered in introductory textbooks (e.g., researcher degrees of freedom, stopping rules, regression to the mean). Good for both students learning statistics, and people who just want to avoid common fallacies of quantitative reasoning in their everyday lives. It’s a good complement to a standard introductory stats textbook.

In terms of learning statistical software, if you’re an absolute beginner I recommend Petchey & Beckerman’s Getting Started with R. (full disclosure: the authors are friends of mine)

Brian: I guess it depend whether you’re talking undergrad or graduate. It’s been a long time since I taught undergraduate statistics, so I’m not sure I have a great recommendation. Beckerman & Petchey’s Getting Started with R  and Zuur et al’s Beginner’s Guide to R are both very well written. They both have a pretty clear spin of introducing R, but they have a lot of good introductory statistics along the way, and really can you say you’re learning stats if you’re not learning R along the way these days? (and R is what will give most people the most grief so having a real well-done intro to R is probably a key success point anyway). For graduate students I’ve taught a lot of stats courses and used a lot of books. I have absolutely no hesitation saying that Gotelli & Ellison’s Primer of Statistics and Zuur et al’s Analysis of Ecological Data are by far the most clearly written, educational books out there. They both go further than ANOVA, but I still think they’re the best books even for people looking just to work up through ANOVA

What courses/books/papers/exercises/etc. would you recommend to early career ecologists with an applied orientation and limited mathematical chops who want to make use of theory in their work? And to what extent is the needed theoretical background subdiscipline-specific? (from Dunbar Carpenter)

Jeremy: I recommend Ted Case’s Illustrated Guide to Theoretical Ecology. I’ve taught undergrads from it for years. Does what it says on the tin, as the British say. Emphasizes graphs illustrating the math–e.g., plots of nonlinear functions, with handy little textboxes and arrows calling your attention to key features. Completely unlike any other intro theory textbook out there, and by far the best starting point for someone with little mathematical background.

After that, I’d recommend working your way through Otto & Day’s A Biologist’s Guide to Mathematical Modeling in Ecology and Evolution. Or else focusing on specific bits of math that are common in your subdiscipline.

Brian: my recommendation is exactly the same as Jeremy’s. Both superlative books. Beyond that, I would say start reading theory papers in the area of ecology you are interested in and really spend time understanding what they’re doing. Reread the equations 5 times. Look it up when you’re stuck. Ask for help when you’re stuck. There is no substitute for learning by example from others. Reading an intro book and then continuing to just skim the equations in the papers in your field is not going to get you anywhere. It will take a lot of time in the beginning but it will get faster if you are persistent. (Jeremy adds: Brian’s suggestion is great, I should’ve remembered to say it myself. That’s how I’ve learned most of the theory I know.)

7 thoughts on “Ask us anything: resources to teach yourself stats and mathematical modeling

  1. I still like Grafen & Hails: Modern Statistics for the Life Sciences, which teaches statistical analysis as modeling. The explanations are graphical and elementary (enough for any first year biology undergraduate). The book is very focussed (on GLM) and is the perfect number of chapters for a course (it was a course, and maybe still is). It is very applied. I really think it should be the first book for an undergraduate biology major. A new book is Andy Hector’s The New Statistics with R
    An Introduction for Biologists. It also uses a modeling approach and emphasizes estimation over hypothesis testing and combines of course with R, which is both a positive and negative. I think the text (explanation of modeling and GL models) does not flow as well as Grafen and Hails, but it’s nice to have the R with it. I do think the question “up to ANOVA” reflects an outdated way of teaching and thinking about statistics. If one starts with statistics as modeling then a simple one-way anova is at the very beginning. T-tests are a special case that if taught, should be taught after ANOVA.

    • My one beef with Whitlock and Schluter is that it teaches t-tests, ANOVA, regression, etc. as separate tests (at the beginning; there’s an advanced chapter at the end revealing that they’re all just GLMs). But I found Grafen’s book too advanced for the intro biostats course I teach.

      Haven’t looked at Andy’s book, but I definitely need to (full disclosure: Andy’s a friend).

  2. “there’s no GLM-based biostats text written at the low level of Whitlock & Schluter”

    I highly recommend Judd, McClelland, and Ryan’s Data Analysis: A Model Comparison Approach. I took a 2-semester course from the first two authors at U Colorado and it was incredible. The pros are 1) very accessible mathematically and highly readable, 2) very in depth with OLS modelling, including fundamentals*, 3) uses a GLM approach, rather than “biometry”, and 4) it’s short. Cons are 1) uses psychology examples**, 2) very few, B&W graphics (it’s mostly text, equations, and tables), and 3) of limited scope*** — it takes 4/5 of the book to get to “ANCOVA”, leaving the remaining 1/5 for non-independence, outliers, and “ill-mannered” error.

    The chapters are titled “ANOVA”, “ANCOVA”, etc., but only to indicate the contents of the chapter cover the concepts of categorical predictors, both categorical and continuous, etc. The book demonstrates an integrated GLM approach to all of it.

    You often hear “when i include this interaction, the main effect is significant”, or alternatively “you can’t/shouldn’t interpret the main effects in a model with interactions”. to both of which i say, “you’re wrong, read this book” ****

    somebody should write a book like this one for ecology [finger to nose].

    * why normally distributed error? why squared, not absolute error?
    ** the book is written by psychologists for social scientists. i don’t want to judge, but perhaps quantitatively-minded social scientists are less indignant than their ecological counterparts that many of their peers aren’t good with math and don’t care to be.
    *** but the concepts learned in detail here are applicable at least through GLMMs.
    **** actually, i try to explain and help them out!

    • Nice lead Timothy. I find there are many wonderful sources on data analysis and modeling from the human sciences. The need for scientists to see the methods presented with familiar examples is a major impediment to information flow. I think it is illuminating to look at a Science Citation Map* to help understand how isolated ecologists are from the variety of quantitative traditions. There is a tremendous payoff for those willing to learn methods through the examples of other disciplines. I have a commentary paper in press related to this issue if anyone is interested.

      Jim Grace


  3. I’m still working on finding time to read through it, and I’m not a modeler much yet myself, but Kokko’s “Modelling for Field Biologists” is a good read and a good intro to various kinds of models. Disclaimer – she’s one of the major modelers in behavioral ecology right now, and I’ve taken her birding.

  4. I would strongly recommend that empiricists who are interested in integrating modeling or theory into their research should consider COLLABORATING WITH MODELERS AND THEORISTS. Just like field, lab, or any other type of research, it can take years and years of focused study and experience to get good at modeling or theory. Sure, you may be able to read a book or two and then apply something that you read to some problem. However, it really does take a lot of training and experience to have a good grasp of the full range of modeling and theoretical tools available and to have good intuition about how to use them most appropriately to answer biological questions effectively. People with expertise in modeling or theory (particularly early-career types with fewer commitments but who might still have some useful training and experience under their belts) will in most cases be *ecstatic* to collaborate with an empiricist who has biologically relevant questions and either data or the ability to collect it. If you can identify such a person and establish a good collaborative relationship, then your (and their) research will end up being far better than it would have if you read a book or two and tried to spread yourself too thin. A collaboration like this early in your career also still leaves open the possibility that you can set out on your own with modeling or theory at some later time should you choose to do so.

  5. In the ‘learning from others’ category I would add find a paper that does something you think is cool/interesting and has published their code (hopefully in R), and work through the code, using their data, your data, or just made up data if you don’t have anything else. It can be a good way to really understand what the authors did and possibly develop your coding skills. And will make you appreciate what clean, well commented code looks like.

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s