Last week the 2015 ISI Impact Factors were announced. Hopefully this was not a date circled on your calendar. But if you were on a editorial board you could not escape a quick announcement of your journal’s new impact factor, whether it gained or lost in rank relative to other journals, and cheers and (email) back-slaps all around or solemn faces and vows to do better. And in my experience authors will now switch allegiance in which journals they submit to so as to follow those ranked highest in impact factor. Is this justified?
First the obligatory disclaimer about impact factors. First and foremost they are metrics of a journal and not metrics of a paper or a scientist. We all know they get used to judge individual scientists or papers (e.g. “so and so published in a journal with an impact factor over 30 so they’re sure to get a job”). But we also all know this is wrong. Here is why its wrong. The number of citations received by different papers within a journal is very skewed and lognormal or power-law like with a lot of variance. Thus it is quite possible (actually probable) that a paper published in a journal with an impact factor of 30 will have an individual impact much less than 30. If you want to evaluate a scientist look at their citation rates or plain old read their papers. Impact factors don’t tell you much.
That said, impact factors are one valid measure of journal quality. It is a one dimensional representation of a high dimensional concept (journal quality), with the many limits innate in that fact. In particular impact is focused on raw number of citations – not citations in influential journals nor citations in diverse journals (eigenvector statistics do these better). And it is a really time specific window. The 2015 impact factors are based on the number of times papers published in 2013 or 2014 are cited in 2015. Thus it is really tuned to measuring citations 1-2 years after publication. Not citations 6 months after citation, which is a good signal of really hot papers. Nor is it citation over longer windows capturing papers that take time to appreciate or in slower moving fields (you can find 5 or 10 year impact factors but everybody focuses on the two year ones).
But here is my biggest complaint about impact factors. And its not really a complaint about impact factors, but how we use them. It seems to me scientists behave very unscientifically around impact factors. Any number we report in our science has a limited number of decimal places with error bars to represent our ignorance. Its downright hard to get a number past reviewers without reporting error bars. But when we look at impact factors we report them to two decimal places and no error bars.
This is wholly unjustified. The error bars are very large. The one paper I know of that seriously looks at error bars is by Sterns (Uncertainty measures for economics journal impact factors) is quite revealing. The 95% confidence intervals are large. So a journal with an impact factor of 9.281 has a 95% chance of actually being in (11.986,6.576) or 3.443 gives (2.467,4.419) or 5.676 gives (5.056,6.403). This was done using an asymptotic method, not bootstrapping. Given the finite samples and skewness I would expect the real confidence intervals to be even larger then this method suggests. And these are 5 year impact factors – the 2 year impacts should be still noisier (i.e. larger confidence intervals). And note, this is not including year-to-year variability. Probably a perfect design would resample citation rates across not just the two year (or 5 year) window of interest. I haven’t done this and am not aware of anybody else who has, but suffice it to say that the above reported intervals are quite conservative, especially taking into account year-to-year variability where the set of articles is allowed to vary.
Why are the confidence intervals so large? As a colleague, Janet Franklin, is fond of saying, impact factors are means not medians. But we all know that when the data is highly skewed, means poorly represent the data. This is why income distributions are reported as medians. Superstar performers and athletes making over $100,000,000/year skew the means too much to be representative. The same with house prices. Or to be even more direct, recall the formula for the 2015 impact factor is:
Now lets look at a typical strong ecology journal with an impact factor around 3-5. It probably has about 80-150 articles published each year. Lets say 100. So the denominator is 200. The numerator is the sum of all the 2015 citations to 2013 and 2014 articles. This is a highly skewed lognormal distribution. This means most of the articles have been cited zero or one times (HT Jeff Ollerton). And there are a handful of articles that have been cited 50 times. And if you’re super lucky there will be one or two cited over 100 times. Maybe even 500 times. Now think about that for a second. If you get one of the papers that goes “viral” and is cited 500 times that goes into the numerator along with all of those papers cited no times or once. And the denominator is 200. That means that one paper is driving 2.5 points (not percent, points!) of your impact factor. You might think this is a made up example, but its highly realistic. Amongst the high end ecology journal, those that saw their impact factors go up in 2015 are those that had a paper with 300-400 cites, or a couple with 100+ cites in 2013 or 2014. Journals that went down often had such a paper (or set of a few papers) published in 2012 that got removed from the 2-year window this year. I’m not going to pick on specific journals and give specific examples, but I could – I have specific papers and journals in mind when I say this.
What kind of papers get those super-highly cited statistics? Usually they are reviews, often of methods, and often of common statistical issues. And often they are opinion pieces by well-known ecologists. Those are important papers that deserve to be well-cited. But given that they are one or two papers a year, they probably have undue influence on a journal’s impact factor. Science and Nature are really no different – most of their impact factor is driven by the handful of papers that contain whole species genomes or talk about a method generic cross fields like network theory and are cited 1000+ times
So I started by saying impact factors are limited but useful in evaluating journals. Now I seem to be saying they are very noisy. What gives? They’re both true. We’re ecologists we should be able to deal with noisy data. But it seems we don’t with impact factors. Every year you hear about a journal that fell 3 places or rose 3 places in the rankings when the differences is often just 0.5 of an impact factor and that almost certainly comes down to a single paper or two – i.e. is well within the error bars, especially when year-to-year variability is taken into account. Yet authors duly switch which journals to prioritize each year based on this. To be honest, this is just silly.
Impact factors give very broadly a crude, coarse-grained ranking, but nothing more. To my mind, I take impact factors, round to the nearest integer, and assume error bars are about + or – 1 impact point. If two journals aren’t separated after doing that you’re probably focusing on noise rather than signal. And you might realistically want to use +/- 2 impact points to have much confidence. And it probably takes 4 or 5 years to count as a trend up or down. Used that way, they have some value. Reporting two decimal places and obsessing whether a journal slipped up or down one place in the rankings or deciding which journal to submit to based on a single years impact factors* is a misuse.
* You could make a pretty good argument that deciding which journal to submit to based on impact factor at all is a misuse, but that is a topic for another day and another post.