.. a myth up there with the unicorn and guaranteed ways to lose weight without eating less or exercising more. But that doesn’t stop people from espousing their version of this myth (Peters 1991, Paine 2010, Likens and Lindenmayer 2011). The usual form is to attack some now trendy but supposedly horrendous version of science and then mildly conclude that the way the author does science is the only really good way to do science. In the latest version of this archetype, two esteemed ecologists, David Lindenmayer and Gene Likens (hereafter L&L) penned an almost vitriolic piece attacking “Open-Access Science”, “Big Science” and I don’t know what all else (that I’m going to call for short hand “new-fangled ecology” for now)*. Now I respect the work of both of these scientists. Gene Likens is deservedly a member of the US National Academy of Sciences. Likewise, Lindenmayer is a fellow of the Australian Academy of Science and were he American and if the US NAS did not have a bias against conservation biology you could make a good argument he would be in the US NAS. These are people who have earned the right to be listened to. So I don’t want to engage in just a slam down on their slam down.
But they did call me a “parasite” and implied it is likely I am doing “context-free, junk science”, which I feel at least entitles me to a response :) And their arguments are oft repeated (although they’re usually expressed in a politer fashion) so the polite versions of the claims do deserve a thoughtful response. Stripped of vitriole, some of their arguments are even made by my own fellow blogger (but of course there the dialogue was civil even when I disagreed in the comments) (UPDATE from Jeremy: I think Brian actually meant this old, not very good post of mine). So let me briefly respond to three points they raise in their piece, then I want to use their piece as a launching point to talk about larger issues.
Three commonly repeated concerns about “new-fangled ecology” that are raised in especially vivid terms by L&L are:
- People who analyze data without collecting data aren’t pulling their fair share (or in L&L terms “There is also the emerging issue of a generation of what we term here as “parasitic” scientists who will never be motivated to go and gather data because it takes real effort and time and it is simply easier to use data gathered by others.”) – Ouch. Full confession – by their definition I am a parasite. I have never published on a dataset I personally have collected. But I respectfully disagree. The usual counterarguments are valid and usually are along the lines of noting that ecology is at the extreme end of the spectrum in terms of believing in individual ownership of data. Meteorology, astronomy, particle physics, even economics all see the data as a public good to a much greater degree than ecology. Of course in their cases, the collection of the data is funded by the government, wholly unlike in ecology :) And taxonomy didn’t exactly go down the tubes when journals required placing sequences in GenBank before you could publish. Notwithstanding all of that I am sympathetic to the plight of somebody who has spent years collecting a dataset. I personally don’t think compelling an individual to share is the right path (as distinct from institutions like LTERs or NEON which are and should be compelled to share their data). So to me the most compelling reason I am not a parasite is that I have students and faculty (from my institution and literally from across the world) emailing me and coming to my office to ask me questions about what kind of statistics to use on their hard-won empirical data, or how to set up a simulation or what theoretical context they can place their data in. Happens almost daily, certainly multiple times a week. Sometimes I can answer in 5 minutes but often its an hour. Occasionally I am formally on their committee and thus in some sense obligated and compensated for doing it. And sometimes it turns into a full-blown collaboration where I am a coauthor. But the vast majority of the time, neither of these apply. Should I start calling my friends, colleagues and students parasites? Hardly! This is called being a good scientific citizen. If you want to spend time setting up a formal credit system where every hour of my consulting time earns me 100 lines of a dataset go ahead. Personally, I’m pretty happy with the current system. Academics don’t live in a world where everything can be put into one unit of currency and traded. We work much better in a spirit of generosity and openness and freely giving (and yes taking) building synergies across our distinct skill sets and circumstances.
- Data before questions (or in L&L terms “do[ing] science backwards … now that we have all the data, what question shall we ask? … junk science”) – From the tone of the article one envisions a couple of troll-like ecoinformaticians skulking into a secret room at ESA and chuckling about the data they’ve stolen from the poor field ecologists and then saying “now that we’ve captured the data, what do you think we should do with it?” and the other troll replying – “Gee questions are hard – we should have stolen the questions from the field scientists too”. OK a flight of fancy there. In all seriousness, I don’t know what secret rooms of troll-like ecoinformaticians L&L hang around with but I’ve been in a lot of those rooms at ESA and NCEAS and etc and I have NEVER heard a conversation that went like “here’s some data, what question can I ask”. If it happened, then I would agree that that was junk science, but it doesn’t happen. I have also edited at least 100 ecoinformatic papers and I’ve never seen a hint of this thought process in those papers either. The conversations I hear go much more like “I am asking question X and I just cannot for the life of me find the appropriate data – do you have any suggestions?” (exactly proving that even ecoinformaticians put primacy on the question). But “what kind of question can I ask now that I have data X?”. Literally never heard this among ecoinformaticians. There’s really nothing left to say on this topic except show me some hard facts and stop criticizing how you imagine other people do science. On the other hand, I have had a few (just a few) students who showed up in my office with a dataset they collected after two years in the field who didn’t seem too clear on which question motivated them to collect the data … :) (I’m just saying!).
- Using data without a detailed knowledge of how it was collected and the ecology of the organsims is dangerous (or in L&L terms “Our extensive experience from a combined 80 years of collecting empirical data is that large data sets are often nuanced and complex, and appropriate analysis of them requires intimate knowledge of their context and substance to avoid making serious mistakes in interpretation.”) – As you might expect by the more moderate tone of the language from L&L, this is probably the most reasonable concern. Who would be opposed to users of data having more detailed knowledge of the data collection and the organisms? No one. But L&L go on to say “There is an increasing number of examples where increased knowledge is missed or even where substantially flawed papers are being published, in part because authors had limited or no understanding of the data sets they were using, nor any experience of the ecosystems or other entities about which they have written.” Sadly they don’t provide any citations to support this claim which makes it hard to refute. Surely if this were a scourge of ecology there would be a few dozen examples? Certainly there have been some meta-analyses published where people dispute the authors’ interpretations (they discuss one such an example later but fail to recognize that discussion is still ongoing and not decisively settled as flawed). But these disputed interpretations are found in all areas of ecology. And if you talk to an experienced ecoinformatician, they go to great care to know the data. I can guarantee that you you don’t want to be part of the some of the conversations I have had about the details of datasets like the BBS, US Forest Inventory or even the Barro Colorado 50 ha tropical tree data. They get into incredibly dry boring detail about survey methods, variations between years, spatial heterogeneity, species that are well or poorly sampled, and etc. Ethan White has even set up a Wiki to capture such knowledge in public form. Lack of knowledge about the context of data is not evident! But ultimately, I think this point is misguided because it is not part of a one-directional goal (“more knowledge=better”) but part of a trade-off – more knowledge=smaller spatiotemporal scales and fewer parts of the world and taxonomic groups covered. If I am comparing 10 regions (or 10 orders of organisms), it is unavoidable that I will know less about each specific dataset. First because it is really unlikely that one person could collect all that data. Or even if they did, they’ve probably forgotten quite a lot about the first dataset by the time they’ve collected the 10th dataset 12 years later. Such cross-region and cross-taxa comparisons are obviously important for the advancement of science. Does somebody who spent 1000 hours collecting a dataset really want to argue that no general principles can be drawn from it and that it cannot meaningfully be compared and contrasted with a dataset from another part of the world or part of the taxonomic realm? Down this road ultimately lies a Simberloffian (2004) view that there are no general principles in ecology and the best we can do is spend our whole lives studying one place. That may work for some people (and more power to them – we need that view), but it’s not why I got into science (and I doubt it is why agencies are giving me funding).
So this brings me to the larger point I want to make that goes beyond the L&L piece, beyond the handful of papers I cited in the beginning, to what I perceive as an unfortunately all too common attitude in ecology. I call it the “not my kind of science=bad science” attitude. The bottom line is we throw around the “bad science” label at other ecologists way too often.
Try this thought experiment. Imagine a congressional staffer (or worse a congressperson) reading the L&L piece. What do you think their reaction is? Do you think it made them more or less likely to increase funding for ecology? Nobody knows for sure. There might be a few percent who actually thought “I don’t know if those L&L guys are right or wrong but at least they’re policing each other and having a strong internal debate about what is good science over in ecology”. But I’m pretty sure that would be an exceedingly rare response. I’m pretty sure much more common responses are “Scientists always say they have special methods for finding truth but they cannot even agree amongst themselves” or “Those ecologists are always bickering amongst themselves over petty philosophical disagreements and never stepping up all hands on deck to solve the problems society needs to solve”.
I have been told that in the 1970s during the strong environmental movement in the US (that is when the clean air and water acts and endangered species acts were passed) there was a move afoot in congress to create an NIE (national institute of the environment) similar to the NIH (national institute of health – which by the way is the major funding agency for all medical-related research in the US). But during congressional hearings, different ecologists started showing up and arguing about whose type of ecology was rigorous or not. This story is hearsay, but it sounds very credible to me.
Truly good science is a rare thing everywhere and junk science happens everywhere, so being able to find not-good-science in a field is a poor reason to label a field as “bad science”. Not only is there no one true route to good science, but good science inherently involves many independent routes converging. We as ecologists need to stop shooting ourselves in the foot and pulling out the mantra “bad science” as a way to put down the other side in our divides (theoretical vs empirical, ecosystem vs population, animal vs plant, and etc). It might help win a battle but it is losing the war (for funding, for respect, for scientific progress). Fields like physics and astronomy have been vastly more successful at attracting funding for fields which, at this point, probably are of less immediate urgency to society than ecology. There are many reasons, but at least one of them is they work together. Beyond the prosaicness of funding, the healthiest branches of science making the most progress are those were people reach across diverse fields and value the multiple perspectives and approaches, using each to their strength. I would like to see ecology become such a field. But its not going to be as long as we keep pulling out the “bad science” card every time we see a few extra dollars or pages in a good journal going in a direction different than our own.
* So here goes the world’s longest footnote – feel free to skip it if you aren’t interested:
As I mentioned, I am not clear exactly what it is that L&L are critiquing (they mention open data and big science, but some of their critiques seem not relevant to either of those). It does seem to me that several distinct ideas have been conflated into some sort of “new-fangled ecology” that L&L and others have been criticizing of late. So let me unpack “new-fangled ecology” into 5 distinct ideas. Each can be done alone or in any combination with one or more of the other ideas.
- Big-science – This is a project that requires many people of diverse skills to perform. Big here is # of participants. Physics with >1000 PhDs searching for the Higgs Boson is the best example, but NEON is no wilting violet either. Big science is an inexorable trend in all fields of science and this is probably a good thing. The days of Einstein dreaming up 3 Nobel-worthy papers while working as a patent clerk, or MacArthur reinventing theoretical ecology while being bored in the army, are over.
- Big-data – This involves really large datasets. As our capacity to store and process large data has grown exponentially, so has our capacity to fill such data storage. However, I would argue that ecology does not have any truly big-data. Big data is measured in terabytes and petabytes and exabytes. Gigabytes barely qualify. Yet most “big” datasets in ecology are under 100MB. They fit in memory. Ecologists are however collecting data at increasingly large spatiotemporal scales and this is noteworthy but probably needs its own name (Big-scale?) from Big-data which is well-claimed already by the computer scientists.
- Data-mining – Using machine learning to find patterns in the data. This is the source of the data before the questions idea. But genuine data-mining in ecology is exceedingly rare. Despite my post praising exploratory statistics, I see data-mining as one step further and one step too far. Good exploratory statistics still starts with clear questions and even tentative theories (that will not be formally tested) in mind. The one place you find data-mining in ecology is in applied purely predictive contexts. EG what will the malaria-carrying mosquito population be next year. I have no problem with saying that data-mining should stay in this limited domain. And it is seriously misguided to think that all (or most) data-oriented ecologists are doing true context-free data-mining.
- Open-data/Metadata – The push to have datasets be: 1) clearly documented (aka metadata) and 2) available for public access (e.g. downloadable on the internet or in journal archives). This is not black and white – one can argue for more or less open-data requirements. It would be hard to argue that ecology wouldn’t benefit from more open-data, but that doesn’t have to mean every single dataset has to be immediately published on the internet the day after it is collected. Also most pushers of open-data are strong advocates of appropriate methods for giving credit to data collectors.
- Meta-analysis/synthesis – The push to do NCEAS-style analysis across many datasets to assess the generality of many individual research projects. This goes all the way back to some of the first meta-analyes on all the competition projects by Gurevitch 1992 and Goldbergon&Barton 1992. Again, nobody is saying all science should be synthetic. On the contrary, meta-analysis implicitly assumes the need for individual data-collection experiments. But it’s hard to argue that its not good to stop once in a while and summarize all the data we’ve collectively gathered in a formal, rigorous, quantitative way.
To repeat – you can have any one of these alone or in any combination. Which of these involve good science? All of them (even Data-mining in the right context). Which of these involve junk science? All of them. In which way is this different from experimental ecology, observational ecology, phylogenies, analytical model development or etc? Its not! All subdisciplines of ecology (and of science) and all distinct methodologies involve good science and junk science.