Many retracted papers continue to be cited years after they’re retracted. Some are even cited about as often after retraction as before–i.e. the retraction appears to have had no effect on how often they’re cited. See here for discussion and links to some data. Which is a pretty depressing commentary on scientists’ citation practices.
But #pruittdata is an unusual case. Approximately a year ago, very serious concerns were raised about anomalies in the data underpinning dozens of papers co-authored by Jonathan Pruitt. Those concerns were widely reported in the scientific press and even made it into some newspapers, and were widely discussed by scientists on social media. Numerous papers of Pruitt’s have now been retracted, corrected, or subjected to Expressions of Concern (EoC), and various investigations are still ongoing. How have citing authors reacted to #pruittdata? Are papers co-authored by Jonathan Pruitt still being cited? And are citing authors differentiating between papers for which Jonathan Pruitt collected data, and papers for which he did not collect data (e.g., review papers)?
I was prompted to look into this a bit after seeing a tweet noting that, according to Google Scholar, Pruitt has been cited much less often in 2020 than he was in 2019. It’s very unusual for an active researcher who publishes regularly in selective journals to be cited much less often one year than in the previous year. That certainly suggests that the #pruittdata news caused many people in Pruitt’s field to stop citing his papers. But of course, looking at Google Scholar citation counts is only a crude first pass. It doesn’t differentiate self-citations from others. It doesn’t differentiate review papers from others. It counts citations from sources other than peer-reviewed papers. And it doesn’t differentiate citing papers that were already in review or in press when the #pruittdata news broke from citing papers that were submitted after the news broke.
So I did some digging on Web of Science. I looked at how often each of Jonathan Pruitt’s 20 most-cited papers (i.e. most cited all time) were cited in 2019 vs. 2020. Those 20 included 4 review/perspectives papers and 16 research papers. I also looked at whether the 2020 citations were from early or late in the year, and whether or not they were self-citations. I also spot-checked 8 other haphazardly-chosen papers of Pruitt’s from 2016-19 (1 review paper, 7 research papers), since his 20 most-cited papers were all from 2015 or earlier. I didn’t count any citations from notices of retraction/correction/EoC. And I didn’t count two citations from a book review. I recorded citation dates the way WoS records them: by the date the peer-reviewed paper first appeared online and was added to the WoS databse, even if that was before it appeared in a paginated journal issue.
Here’s what I found:
- Citations to review papers co-authored by Pruitt haven’t dropped much. Well, citations to a couple of them dropped somewhat from 2019 to 2020, citations to a couple of them were about the same, and citations to one of them (the most recent one) increased. So, no pattern to speak of. Further, citations to review papers were an appreciable fraction of all of Pruitt’s 2020 citations. So if you’re wondering why his citations didn’t drop even more from 2019 to 2020, that’s the main reason why. Because…
- Citations to previously well-cited research papers co-authored by Pruitt cratered as soon as the #pruittdata news broke. Pruitt’s 16 most-cited research papers got 150 citations in 2019 vs. just 64 in 2020. Further, most of those 2020 citations were from papers that were published in the first half of 2020 and so likely were submitted before the #pruittdata news broke. Those 16 research papers only received 13 citations from July 2020 on, much lower than the 32 you’d have expected if the 2020 citations had been distributed evenly throughout the year. Further, those 13 late-in-2020 citations included 1 from a philosophy journal. Philosophy papers often go years from submission to publication (yes really). So really, it’s only 12ish citations that seem like they might date from post-#pruittdata. (I say “12ish” because of course it’s possible that some papers published in the first half of 2020 were submitted after #pruittdata started, and that some papers published in the second half of 2020 were submitted before #pruittdata started). Further still, even the 51 citations these 16 papers received in the first half of 2020 are well down from the 75ish you’d have expected if these 16 papers had continued to be cited as often as they were in 2019. Finally, if you eyeball the pre-2019 citation data, these 16 papers mostly looked to be on flat citation trajectories in the years before 2020. So I don’t think these results are even partially attributable to a long-term trend of decreasing annual citation counts for Pruitt’s older papers. But just to be sure, I checked some recent papers (see following bullet).
- Citations to Pruitt’s more recent research papers also dropped when the #pruittdata news broke, but not quite as much. The 7 recent research papers I haphazardly checked were cited a total of 41 times in 2019 vs. just 29 times in 2020. Of those 29 citations in 2020, only 10 were from the second half of the year. Further, several of these recently-published papers looked to have been on an upward citation trajectory prior to 2020. In the absence of #pruittdata, you’d have expected them to be cited more often in 2020 than in 2019. So it looks to me like citations of Pruitt’s recent papers were down just as much as citations of his older papers, relative to how often you’d have expected them to be cited in 2020.
- I didn’t see any obvious temporal pattern in the 2020 self-citations that distinguished them from the other 2020 citations. And just offhand, it didn’t look to me like self-citations were an appreciably larger, or smaller, fraction of the 2020 citations than they were of the 2019 citations.
- There was no big obvious difference in citations between papers subject to retraction/correction/EoC and other papers.
In conclusion, it looks to me like most people in Pruitt’s field stopped citing Pruitt’s research papers as soon as the #pruittdata news broke in late January 2020. That’s a measure of just how widely and rapidly the news spread, and just how seriously the news was taken by people in Pruitt’s field.
Further, it looks to me like most people in Pruitt’s field have adopted the heuristic “don’t cite any research papers co-authored by Jonathan Pruitt”. Rather than, say, only avoiding citation of papers that have been formally subjected to retraction/correction/EoC. And rather than distinguishing between papers by Pruitt for which Pruitt collected data, and papers by Pruitt for which he did not collect data. None of which surprises me. Anecdotally, I’ve heard from several ecologists who said (in so many words) “Pruitt has over 100 research papers. I don’t have the time or ability to keep track of which ones are subject to what level of concern. So I’m just going to err on the side of caution and not cite any of Pruitt’s research papers.” I think that’s a totally understandable approach for citing authors to take. Especially if you’re just looking for a citation in support of some broad point for which you could cite many papers, or a citation in support of some passing remark. Of course, not citing any of Pruitt’s research papers has the unfortunate side effect of cutting citations for some papers co-authored by Pruitt that remain reliable. I sympathize with Pruitt collaborators who will lose a few citations for that reason. But I don’t see any realistic way of avoiding that side effect. After all, people use all sorts of heuristics all the time when choosing papers to cite or not cite. It’s not realistic to expect them to not use a heuristic in this very unusual situation. Fortunately, I doubt anyone is going to suffer any career consequences just from losing a few citations. Nobody’s career outcome is so sensitive to the exact frequency with which they’re cited that a few lost citations will make any appreciable difference at the margin. There absolutely are substantial negative consequences to Pruitt’s collaborators and trainees from #pruittdata–but those negative consequences mostly aren’t to do with a few lost citations to still-reliable papers that were co-authored by Jonathan Pruitt but for which Pruitt didn’t collect any data.
So if you’re concerned that journals have yet to retract some #pruittdata papers about which serious concerns have been raised (which I am), or that some journals have allowed corrections in cases where they arguably should’ve retracted (which I am), well, try to keep your concerns in perspective. Because all signs are that, whatever formal decisions journals make, most authors have already stopped citing Jonathan Pruitt’s research papers. And I wouldn’t be surprised if the rate at which they’re cited continues to drop (asymptotically) in future.
I’m more concerned about a possibility this post doesn’t address: that some people will stop citing papers by Pruitt’s collaborators and trainees that weren’t co-authored with Jonathan Pruitt. Nick Keiser says he’s seeing this happening to his papers, which is really unfortunate. That’s something that actually could have career consequences, not just because of the lost citations themselves but because those lost citations are a symptom of lost reputation. It would be really perverse for the reputations of Pruitt’s collaborators to suffer because of #pruittdata. Many of Pruitt’s collaborators and trainees have been at the forefront of identifying and addressing anomalies in papers for which Pruitt collected the data, at considerable cost to themselves in terms of time and emotional well-being. They’re good scientists who ended up in a terrible situation through no fault of their own, and who’ve been going above and beyond to do the right thing. If anything, I think you should trust their non-Pruitt-co-authored papers more than you would have before #pruittdata broke, not less. As I said above, I totally get that people need heuristics to decide what papers to cite. But “Don’t cite anything written by anybody who’s ever worked with Jonathan Pruitt” is a bad heuristic. Nobody should adopt that heuristic, no matter how cautious they want to be about only citing reliable work.