I use R. I like it. I especially like the versatility and convenience it gets from add-on packages. I use R packages to do some fairly nonstandard things like fit vector generalized additive models, and simulate ordinary differential equations and fit them to data.
You can probably tell there’s a “but” coming.
There are a lot of R packages now.* And in ecology, I’ve noticed that many of them are very specialized. I’d say overspecialized. For instance, there is now an R package to fit a small number of functional response models to predator feeding rate data. As another example, there’s now a package to simulate the dynamics of the Yodzis-Innes food web model. Many other examples could be given (aside: I emphasize I’m not writing this post to pick on the authors of any particular package. I’m interested in what seems to me to be a broad-based trend.)
I call these packages overspecialized because they’re just doing a narrow subset of the things that existing, broader packages can do. For instance, fitting predator functional response data is just a special case of nonlinear regression. R already has packages for parametric, nonparametric, and semiparametric nonlinear regression. If you want to fit a nonlinear regression to your functional response data in R, you should use a nonlinear regression package. The convenience you gain from using a highly specialized package is a false savings. Instead of having to think about what functional response model(s) you might want to fit, learn how to specify the nonlinear regression(s), and then evaluate the fit(s), you just let the package authors do your thinking for you by restricting yourself to the limited range of options they offer and sticking with their (often debatable) choices of defaults. Which isn’t how you learn. As another example, simulating the Yodzis-Innes food web model is just a special case of simulating ordinary differential equations, a task for which R packages already exist.
I also worry about the often-debatable choices of package authors becoming field-wide defaults, purely by virtue of the package’s convenience. For instance, the Yodzis-Innes model is a perfectly good food web model for many purposes–but so are lots of other food web models. If you need to simulate a food web model as part of whatever project you’re doing, you should think about which one you want and why. You shouldn’t just pick whichever one is most convenient to simulate because it happens to have a dedicated R package.
More broadly, package authors often say that they pick defaults so as to prevent or discourage inexperienced users from making technical mistakes. But what if the package authors themselves are the ones who are mistaken? The dream of reducing the rate of technical mistakes in the literature by imposing default choices on inexperienced statisticians and modelers is a false dream, I think (see here for further discussion). You reduce errors by teaching users good judgement. I don’t know if R packages can be written so as to help teach users good judgement–but I’m pretty sure that writing highly-specialized packages with debatable defaults doesn’t help.
By the way, I say this as someone who’s written code that could (I assume) be converted into a highly-specialized R package. For instance, I have a bunch of code I wrote for simulating various standard simple discrete-time metapopulation models. I could convert that code into a R package called DiscrMpop. And these days, I could even get a paper out of it too, in Methods in Ecology and Evolution. But I confess that I have no urge to do so. Because I think the people who really would benefit from the slight convenience of being able to use that hypothetical package are vastly outnumbered by the people who would be better off having to think about which metapopulation model they want and then code it up themselves.
Convenience is great. So is relying on code written by people who write much more reliable code than you ever could. I’m certainly not arguing that everyone should stop using R and its packages and go write their own code in assembly language! But the gains in convenience that come from using some highly specialized package that only does a subset of the jobs of some more general-purpose package are pretty minor or even nonexistent, I think. And I’m not sure there are any gains in reliability either. Indeed, I suspect just the opposite, because I bet R’s popular general-purpose packages are among its most reliable.
My worry here is related to but slightly different than Ben Bolker’s worry about whether statistical software is harmful. Ben was worried about people treating powerful, flexible software as a black box without really knowing what it’s doing under the hood. My worry is about people treating inflexible, highly-specialized software as a black box without really know what it’s doing under the hood.
I am aware that I have probably just annoyed some large fraction of you. Looking forward to learning why I’m wrong in the comments.
*Thanks Captain Obvious!