So as readers of this blog will be aware Jeremy Fox of Dynamic Ecology got major play on the social media for this recent piece on why EO Wilson’s folksy Wall Street Journal editorial on why/how you can be a great biologist without math is wrong. It broke pretty much every readership statistic this blog has.
As with all things internet, it appears the attention span has moved elsewhere. But I wanted to lay out my case for why math is an important skill for biologists in a calmer context, quite independent of and not in reaction to Wilson’s folksy rhetoric.
Before I do that, I want to make clear that my answer to whether empirical work or theory is more important is an emphatic “both”. Ecologists love to frame things in either or debates (density dependent or independent, competition or predation, …) and have wars, but both is a safe bet in most of these cases and it is here too. The theory vs field debate is just another flavor of this. As Terry noted this debate has been going on for a long time. Sharon Kingsland in her history of ecology uses this debate as her central organizing theme. (this debate is actually as old as science itself – see footnote*). But I just want to be clear I am not in an either/or frame and I will return to this at the end.
My central thesis here is that math is important in science (I choose the word “important” carefully – I don’t want to go so far as to say “necessary” but “useful” is rather weaker than what I want to claim). Although there are more reasons, I want to give two arguments for why math is important: modelling and variance.
Argument 1 – Math and modelling
So first let me give my view of what modelling is. Here is my definition of modelling:
An abstraction of the real world into a domain where logical inference (deduction) can be applied to transform explicit assumptions into new predictions
Which I demonstrate with the following figure:
First, I would like to make the case that modelling in this sense is central to the scientific method. I am not a big fan of the 4 step scientific method taught in grade school. But producing predictions and bringing them into reality and testing them is clearly part of that. I think this view of models also fits into more complex, realistic views of how science works as well. But as long as you buy that science has anything to do with predictions OR general principals, you either have to become a very pure sensu strictu empiricist (i.e. no thinking or processing involved – not the more inclusive branches of empiricism like pragmatism) or you have to bring this very sensu latu definition of modelling into having a role in science.
Note that this definition of modelling heavily emphasizes a domain of logic separate from reality but does NOT say what it is. And in fact there are many domains of logic including verbal, pictorial, mathematical, scaled (think of an architectural scale model, a map, or a scaled engineering prototype), etc.
OK, you say, enough philosophy, talk about math. Well I absolutely think modelling includes verbal models, and I would never say “you aren’t doing modelling” or “you aren’t doing science” if you have a verbal model. But I think ecology is replete with stories where verbal models are just a tad too fuzzy and get us into trouble. There are two main benefits to having a precise language in the right hand (abstract domain) portion of the figure. One is that it documents the assumptions (including how variables are measured) in a way about which there is not disagreement of what we’re talking about (which may inspire disagreement about what we should talk about). The second is that it allows us to apply rules of logic to deduce new statements (aka predictions or hypotheses). Something with more precision advances science.
One example is Elton’s hypothesis that increased diversity increases stability of a system. A nice intuitive idea argued verbally by Elton. Except then May gave a counter example. A decade of confusion ensued and eventually we ended up with Pimm’s paper pointing out that diversity and stability are both vague concepts (we have this habit in human languages of overloading a word with many meanings) that need precise quantitative definitions before you can even empirically measure and test the idea (an example of having precisely stated assumptions even if only to let us start debating what assumptions are right). Since that paper, the world has largely focused on CV (coefficient of variation) in total community biomass, and a productive conversation has achieved real progress (with more work to go). May’s model also is an example of the second benefit of moving into the abstract world – using logic to deduce implications. May took a very specific definition of stability (based on a steady equilibrium in a quadratic differential equation system using eigenvalues) and showed that this stability depended not just on diversity (i.e. just number of species) but number of interactions (something that had not until then been conceived from empirical observation) and thereby broadened the debate to complexity instead of just diversity. A clear example of a precise logical deduction that was useful. (UPDATE: Jeremy has an old post reviewing the many different things ecologists have meant by “stability”, which emphasizes the importance of mathematically-precise definitions, independent of their empirical tractability)
Jeremy’s favorite zombie idea (the intermediate disturbance hypothesis IDH) is another great example. The IDH theory was based on verbal intuition about competition not running to completion (assumed to be competitive exclusion) with frequent enough disturbance. However rigorous mathematical models show that “running to completion” is not really the important point for competitive exclusion but whether or not one species gains relative to another competitively over time is more important. Again, people could and did have these ideas verbal-intuitively, but the math just made things more clear.
So to recap – modelling is critical to science, and a precise language of abstraction is critical to modelling. You can see where this is heading … mathematics is a heck of a precise language of abstraction. Its not the only one, but it is a heck of a good one. And here I define math rather broadly to include pictures, simulations etc.
Argument 2 – variance
Let me completely leave modelling for a minute and give another argument for math – variance.
Ecological data is full of variance. Indeed variability makes ecology exciting. But it also makes ecology challenging. The human mind is not well adapted to reasoning about variability. You can cite Kahneman on this or our the fact that almost as many people died in car accidents because of an irrational avoidance of flying after 9/11 as died in the 9/11 terrorist attack. This is a well-documented phenomenon in risk science. Personally, I like to cite the birthday problem. It asks how many people you need to have in a room to have at least a 50% chance that two of them have the same birthday (day of year, not necessarily same year). I regularly pose this to my statistics class, and they are way off, as indeed is almost everybody who hasn’t heard this problem before. What is your guess? The correct answer is below**. Even when I show my students the probability argument they are disbelieving. Often they want to put it to an empirical test, so, even though my class is always a bit below the 50% chance, and its only a 50% chance, I’ve tried it and actually three times in a row it has worked – somebody in the room has shared birthdays! Bottom line point – people are terrible at reasoning about variability and probability.
More relevant to ecology, is the data in the picture below inconsistent with the idea that the x variable is causing the y-variable?
It is noisy. But it is definitely not inconsistent with the x causes y hypothesis (indeed, this was my first and only attempt at generating random, noisy data assuming the hypothesis was true). Of course this data is also not inconsistent with a null hypothesis of a flat line, so it is not statistically significant (I didn’t ask that question because most of us have asked that question often enough to have trained our intuition). But I wager you had no real thought process to answer the question of whether the data was inconsistent with the hypothesis.
So what is one to do in a field that is full of variability and stochasticity when the human mind is terrible at reasoning about probability? It will not surprise you that my answer is … duh, duh, duh … mathematics. This would be why the one required math course in many graduate ecology programs is statistics. And every graduate student I know wants to learn more statistics.
Implications for individual scientists
So, I have argued that mathematics is important for science in part because of its utility in modelling and its ability to counteract our poor reasoning skills around variance. I have not anywhere argued that it is not science without mathematics, that the essence of science is mathematical description or any such. I’ve just argued that mathematics is really, really useful to science to the point of being more than just useful but important to science. What does this mean for the individual and for the social structure of science?
It seems rather obvious that one can use a Venn diagram to locate work – namely pure empirical, pure mathematical and an intersection or blend (you can use a continuum along an arrow if you want to be less binary). The first point I want to make is that this diagram is scale dependent. One paper that is purely mathematical but then followed by a test is blended at the scale of two papers (or perhaps one career).
But leaving aside the issue of scale for a minute, first I want to note that there is a lot of work outside the intersection, i.e. in just the empirical or just the mathematical. There are whole journals full of purely mathematical work that are just cloaking themselves in biology. I recall interviewing for graduate school with a famous evolutionist who was bragging to me about how they had solved an equation that even people in the math department thought was hard – but he’d never mentioned why it was biological interesting. I didn’t go to graduate school there. Although I am in the minority given the impact it had, I would argue May’s aforementioned diversity-stability work is also in the purely mathematical box – he had no data and his definition of stability was mathematically convenient but biologically unrealistic. I am asked to review papers like this all the time. Lest I be perceived as the outsider criticizing, I won’t take the pure empirical to task, but it definitively exists. So a choice of where to do work in this Venn diagram does exist.
Given you have a choice, I want to make the point that by far the most influential bodies of work are in the center of the Venn diagram (check out the list of ESA’s Mercer award winning papers). And the most influential ecologists were in the middle (check out the list of ecologists honored in the National Academy of Sciences). There are plenty who lean more empirically (Gene Likens) or theoretical (Simon Levin) but all have managed to avoid the extremes. And more particularly, all have had collaborations to move to the center. Or they publicized and translated their work well enough that the other side picked up their work from the literature. So in other words, they found ways that at larger scales their work was in the center. And I’m pretty sure they didn’t have successful collaborations by telling their partners that the partner’s half of the story was trivial and you could replace them easily! And a surprising number of members of the National Academy are in that rare but lucky category that are able to move fluently between theory and field by themselves.
So this is my bottom line. Both horizontal arrows in my first diagram involving linking math and the real world. The use of variability is also about linking math and data. Good science is about the fusion of the two! It is not about one. It is not about the other. It is about both! So this choose one/which is better debate is entirely off target.
What I conclude is that if you are innately more mathematical, then if you want to do great science, you will spend your whole career finding collaborations, graduate students/postdocs, inspiring papers and self-educating to add-in the real world component so as to move to the center. If you are on the innately more empirical side, then, if you want to do great science, you will spend your whole career finding collaborations, graduate students/postdocs, inspiring papers, and self-educating to add-in math so as to move to the center. To say that you have to be great at math to be a great scientist is wrong just as it is wrong to say you have to be great at field work to be a great scientist. But anybody who tells you that you can hang out at one or extreme and ignore the other side or trivially fill in the other side is doing you a disservice (or giving you a formula for doing less than great science). If you are driven to do great science, you will likely spend your whole life working (and I do mean working) to get to the center***.
*It is worth nothing this debate is as old as science itself. The Greeks favored a purely deductive (theory) approach based on pure logic rather than the dirty real world. Many don’t realize it but Euclid’s geometry based on postulates and proofs was seen as the best approach for all science, not just math. Plato’s cave visually captured the Greek world view that the real world was just an ugly distortion of the underlying perfect beauty (I mean that quite literally) accessible only through the logic of the mind. Then of course in the early 1600s people started waking up and realizing this was just a formula for experts to claim they were right. Bacon published Novum Organum arguing for empirical tests as the core of science and the enlightment was begun. If you think this is oversimplified (it is) go look at Newton’s 1682 Principia – it is completely framed as a Euclidean postulate proof deductive (pure theory) work even though if you read a biography it is clear he was highly empirical in his approach (e.g. his pursuit of the inverse square distance law to explain Kepler’s empirically derived elliptical orbits). But since that time empiricism has had a clear ascendancy. No Nobel prize has been awarded for purely theoretical work (even Einstein’s great theory of general relativity was never recognized – his much more empirical work on the photoelectric effect is what he won a Nobel prize for). Big data is only exacerbating this. The days of purely theory being acceptable in mainstream science are gone, and they should be. So ecologists are not unique in this debate – its really the central scientific methodological debate.
** 23 or more people in a room mean that you have a >50% of some two people sharing some birthday.
*** you might ask what about those individuals who can naturally do both well. First, in my experience they’re rare. Second, the appropriate center shifts from problem to problem. So even somebody like myself who is fairly mathematical needs to go to more mathematical people on occasion (and vice versa). It is always a constant tuning problem.