Citation concentration, filtering, incentives, and green beards

Scientific citations are highly concentrated (Ioannidis 2006). In any field, a small fraction of papers garner a large fraction of the citations, and a small fraction of journals publish a large fraction of those highly-cited papers.

Why is that? It could be because the distribution of paper quality is highly skewed. Only a small fraction of papers are really good and really important, and those papers garner many citations. But although I do think the distribution of paper quality is skewed, I doubt that’s mainly what drives citation concentration. I think we’d see high citation concentration even if there weren’t much variation in paper quality. It’s all to do with filters and incentives.

There are far more papers published than anyone can read, even if you’re just reading the abstracts (heck, even if you’re just reading the titles!) So everyone needs “filters”–some way of deciding what small fraction of the literature to pay attention to. In practice, one very common filter is “read what’s in the leading journals”* Now, you can argue about whether or not that’s a good filter, in the sense of a good way to identify high-quality papers. But frankly, I don’t know that it matters if that filter is good or not. Because here’s the thing: the fact that it’s a common filter means that everybody has a strong incentive to use it. Even if you think Nature and Science (and in ecology, Ecology Letters) are just “tabloids” and that the ecology papers in those journals aren’t any better than those in any decent ecology journal, you have to pay attention to Nature or Science, because everybody else does. And you also have strong incentives to try to publish in those journals, because they’re leading journals. And what’s a “leading journal” but a journal that everyone reads and tries to publish in? And papers in those journals get cited more, because everyone reads them.

I often hear people bemoan this state of affairs. They complain that it makes too much of our science a crapshoot. Too much about what research topics become “hot”, who gets hired for faculty positions, who gets grants, who becomes famous, etc., depends on who gets lucky enough to publish in a very short list of journals. And while I completely agree that a lot does depend on publishing in a very short list of journals, and even agree (to an extent) that what gets published in those journals is a crapshoot, I don’t complain about it (well, at least not much). Because I don’t see how it could be any different. So long as much more stuff is being published than any one person can read, everyone is going to need to use filters. And further, everyone will always have a strong incentive to use the same filters as everyone else. You say you only want to read the “best” stuff? Well, in science, the “best” stuff is in part defined as “the stuff everybody else is reading”. As a scientist, you can’t just ignore what all your colleagues are reading about, thinking about, talking about, working on, and citing, not unless you’re ok with everyone ignoring what you read about, think about, talk about, and work on. And insofar as there are other attributes that define what’s “best” besides “what everybody else is reading”, how do you plan on picking out those other attributes except by using “what everybody else is reading” as a surrogate?

Even if we did away with journals, and just published all our work in PLoS ONE or on ArXiv or on our own blogs, we’d still have high citation concentration (or its equivalent–high incoming link concentration or high article-level metric concentration or whatever). Because we’d all still have filters of some sort, we’d all still have strong incentive to use the same filters, and we still wouldn’t know any other way to filter out the “best” stuff except to use “what everybody else is paying attention to” as a surrogate.

Duncan Watts’ Everything is Obvious (Once You Know the Answer) is a fine book which includes lengthy discussions of the same phenomenon in all kinds of contexts. This isn’t just a scientific publishing thing. So you think getting rid of “citation concentration” or its equivalent could be accomplished just changing the way in which we all publish, think again.

My original draft of this post stopped there. But then last night I thought of what might be a half-baked evolutionary analogy for this. I haven’t thought it through, so I could be way off. But it’s too fun not to share:

I think citation concentration is a green beard effect.

If you don’t know, the “green beard effect” was first proposed by Bill Hamilton, but Richard Dawkins gave it its name. A green beard gene, or cluster of linked genes, has three effects:

1. produces a perceptible trait, such as a green beard

2. causes the bearer to recognize that trait in others

3. causes the bearer to direct preferential treatment towards others exhibiting the perceptible trait

The neat thing about a green beard is that it’s not a signal of intrinsic “quality” or “fitness”. A green beard doesn’t make you more fecund or long-lived or etc., nor is it a signal that you have other traits making you more fecund or long-lived or etc. A green beard is an arbitrary signal, and is only favored because everybody else with green beards favors it.

Filtering the literature by only reading and seeking to publish in the same few places as everyone else is a sort of green beard, I think. It produces a perceptible trait, namely papers in Nature and Science. It also causes recognition of, and preferential treatment of, those bearing that perceptible trait. I admit I’m still fuzzy on whether the “traits” here are attributes of individuals or journals or both, so maybe the analogy can’t actually be made all that precise. But don’t let that stop you from showing up to the Ecology Letters reception at the ESA meeting dressed like this. 😉

*Another fairly common filter is “read stuff written by famous people”. Much the same argument applies to this filter. Everyone has a strong incentive to use this filter, because it’s a common filter. You have an incentive to read what everyone else is reading–and everyone else is probably reading Dr. Famous’ latest paper.

p.s. Before anyone points it out, yes, I am aware of Vince Jansen’s paper on beard chromodynamics (coexistence of beards of various colors, with bearers directing preferential treatment towards others with the same color beard). That paper could be interpreted to mean either that my analogy to green beards isn’t a good one, or else that the analogy is good but that the green beard effect need not lead inevitably to high citation concentration because different filtering rules could coexist.

20 thoughts on “Citation concentration, filtering, incentives, and green beards

  1. This is why I like Stefano Allesina’s proposal Accelerating the pace of discovery by changing the peer review algorithm. The basic idea is that authors submit to an arXiv-like preprint server, get reviews, then journals act to ‘bid’ on content. Basically, they become the ultra-high quality filters. His simulations show a system with better outcomes overall – except editors have to put in more work. To quote from the abstract “Manuscripts’ evaluation is faster, authors publish more and in better journals, and reviewers’ effort is optimally utilized. However, more work is required from editors.”

    • Yes, I’ve seen that paper. Owen Petchey, Stefano and I once chatting about using Stefano’s framework to model the effects of PubCreds, but it never happened.

      IIRC, Stefano’s model does assume that papers vary in quality (I could be misremembering, been a while since I’ve read his paper). Someone like Duncan Watts would argue that it’s not at all clear that one can assume that. Do we think Shakespeare is the greatest writer in history because his work is intrinsically the greatest? Or has what it means to produce great writing come to be *defined* as “Shakespeare”?

      I’d be curious how Stefano’s model would behave if there were no parameter for intrinsic merit, merely various random properties of papers, which editors and readers having randomly-chosen preferences for different properties, with the preferences being allowed to change over time as editors and readers try to maximize the number of citations their papers receive. I’ll bet you end up with a world with high citation concentration, in which everybody has roughly the same preferences.

      • Um, PeerJ has pre-publication peer review, unlike arXiv, and IIRC the financial model is rather different in terms of who pays for it. Or were you just referring to the fact that PeerJ includes a preprint archive in its membership fees?

  2. Take the Pepsi challenge: remove the distinctive formatting from ten papers from top journals and ten from lesser journals and read them. I bet you’d be able to correctly identify the source of at least 80%. Journal isn’t a completely reliable signal of paper quality, but it’s definitely got some predictive power.

    I agree there are positive feedback loops in publishing, but I think a reliable hierarchy emerges. It’s more complicated than people reading papers that others read based on the journal. Here’s my model:

    People read high-impact journals because they have good papers. Everyone wants to publish in these journals because they have high impact. Because they get more submissions, high-impact journals are more selective in the review process, and therefore publish better papers.

    On the other hand, few people read low-impact journals because they have more, uh, “quality challenged” papers. No one really wants to publish there, but sometimes it’s there or the circular file; these journals don’t get too many stellar papers, so they have looser editorial standards and will accept these slightly blemished papers.

    Notice the key factor is the peer-review process. As long as that’s working well, everyone wins. Good papers get the wide readership they deserve, while the others still get published but are probably read only by the die-hard fans. Luckily, in most cases the peer-review process does its job. Sure, some stinkers slip through the cracks of Nature/Science and lots of good papers are published in lower impact journals, but overall the system works.

    • I actually agree with that characterization of how the system works. It’s just that I also think that, even if the peer review system didn’t work at all–or even if there was no “work” for it to do because every paper was of equal “quality”–we’d still end up with high citation concentration.

    • Take the Pepsi challenge: remove the distinctive formatting from ten papers from top journals and ten from lesser journals and read them. I bet you’d be able to correctly identify the source of at least 80%.

      I bet you’re right. But the reason would have nothing to do with the quality of the science, and everything to do with the distinctively compressed, corner-cutting style that is necessary to wedge papers into the “top” journals, with their super-strict space requirements. Other factors that make me classify papers as “probably from a ‘top’ journal” would include a self-promoting quality in discussing the significance of the results, and the use of a new technique to discover something sexy before it’s been proven to work on a control group.

      These are not the properties of good science, and I am keen not to do anything to encourage people to work this way. That such papers make such a contribution to career success seems to me to indicate something very profoundly broken about how science is done; and, worse, how it’s marketed.

  3. Hi all. May I firstly say, congratulations to Jeremy Fox on a great blog. Being a young scientist (currently doing my PhD in ecology) I have never really been one to follow blogs. But I must admit, I’m hooked, and that I really do look forward to each new installment of this blog on a daily basis. Jeremy, well done, and keep it up!

    Right, back to what I was going to say. For many years, my colleagues and I have argued as to what makes a good journal, or for that matter a good paper? I agree with follower “lowendtheory”, in that there is definitely a positive feedback loop. However, I have a different approach to it, whereby I believe that there is a critical citation point. By this I mean, a good paper with good scientific merit gets a lot of citation for the obvious reason(s). But at what point (i.e. citation number) does that paper start to be cited by new papers, purely on the fact that it has ALREADY been cited numerous times? Another way to look at it; Is there a citation number that a paper needs to reach in order for it to be considered a “good” paper?

    I might be completely off the mark here, but I believe that in some instances this might well be the case.

    keep up the great work!

    • Hi Jeff,

      Glad you like the blog. Re: how high quality work can start to get cited purely because it’s been cited a lot previously, I certainly agree. I have an old post on this, it’s called the bandwagon effect.

  4. Pingback: Friday Coffee Break « Nothing in biology makes sense!

  5. Pingback: Friday links: citation inequality, gender inequality, and more | Dynamic Ecology

  6. Pingback: How random are referee decisions? (UPDATED) | Dynamic Ecology

  7. Sorry to spam your blog so heavily. I’ll try to lay off after this one.

    Everyone needs “filters” — some way of deciding what small fraction of the literature to pay attention to. In practice, one very common filter is “read what’s in the leading journals”.

    Is that really true? Does anyone sit down a copy of Nature and skim it cover to cover looking for relevant articles? Surely not — what’s of interest to you in such journals will certainly make its way to you in other ways (subject-specific TOC alerts, RSS feeds, tweets, mailing lists, word of mouth).

    I think read-what’s-in-the-leading-journals is a seductive idea that’s not at all borne out by the reality.

    You say you only want to read the “best” stuff? Well, in science, the “best” stuff is in part defined as “the stuff everybody else is reading”.

    We are scientists. We have to be better than that. In popular music, “best” is define as “the stuff everyone else is listening to”, and that’s what’s given us Britney Spears and Justin Bieber. It’s encumbent on us to find ways to promote the science that is not equivalent to BS and JB. That means judging by actual qualities: was the work done carefully, is it explained clearly, has the technique been shown to work, do the results follow, are they statistically significant, are the conclusions merited, is the experiment reproducible, does it open up new pathways of research?

    If we’re not judging by merits, we’re not scientists.

    While I completely agree that a lot does depend on publishing in a very short list of journals, and even agree (to an extent) that what gets published in those journals is a crapshoot, I don’t complain about it (well, at least not much). Because I don’t see how it could be any different

    This seems like a failure of imagination.

    • No apology needed Mike, and please don’t feel you have to lay off just because you’ve commented a lot recently. By all means, if you still have things to say that haven’t already been said, say them! That’s what keeps good conversations going.

      Re: “read what’s in leading journals” as a filter, yes, I actually do just skim Nature (online) looking for relevant articles (and interesting looking news, book reviews, etc.). Same for Science, PNAS, and leading journals in ecology and evolution. Call me old-fashioned. Subject-specific TOC alerts could well replace my approach for Nature, Science, and PNAS, but they wouldn’t be any quicker, and they wouldn’t work for other journals since the subject-specific TOC would just be the TOC. Similarly, subscribing to journal TOCs via RSS would not fundamentally change my filters. Twitter would be far too scattershot, and word of mouth isn’t feasible (and would be too scattershot even if it were) as I’m the only person at my university interested in most of the things I’m interested in. I do use Google Scholar’s recommendations for me on the suggestion of a younger and more web-savvy colleague, and that seems to be a modestly useful supplement to my current filters, but very far from an adequate replacement, in part because there are various literatures I like to keep up with that I’ve never published on or cited.

      Re: the best stuff being defined as “the stuff everyone else is reading”, sorry, I think that’s the way things are going to be whether you like it or not. You seem to want science to be judged largely, though perhaps not entirely, on the basis of technical soundness. That’s not going to fly, not for people like me or anyone I know. There’s far too much technically sound work published for me to even read the titles of all of it, and the same is true for anyone who has more than the narrowest interests. So when I’m reading just to keep up with the broader literature and just because I’m interested, I don’t just want to read technically-sound stuff. I want to read the best technically-sound stuff. Where “best” doesn’t mean “most technically sound”, it means things like “most interesting, important, novel, incisive, creative, surprising, general…” and all sorts of other attributes that I freely admit are somewhat subjective (though not totally subjective I don’t think, unlike a preference for Britney Spears over Maroon 5). In practice, I’ve found that a pretty good way to identify work with those attributes is “keep track of what’s in leading journals”. I emphasize that I do not simply assume that whatever’s in the leading journals is interesting, important, etc. But that’s a key first-pass filter for me, and it seems to work. If it didn’t, I’d change it appropriately. For instance, when the journal Ecology Letters was first founded, it wasn’t yet a leading journal as it was brand new. But I’ve followed it from the get-go, since from the very first issue I found that doing so was a good way to find what I consider to be good papers. But I don’t pay much attention to PLoS ONE, because frankly I find that good papers in PLoS ONE are far too rare relative to what I consider to be technically-sound-but-really-boring papers.

      And yes, I also feel like I need to keep up with what my colleagues are keeping up with. Even if I don’t like, say, phylogenetic community ecology (and I don’t like a lot of it), I can’t just ignore it, even though I don’t do it myself–it’s one of the hottest areas in my field right now. To ignore it would not be analogous to me exhibiting my own independent taste in music, it would amount to me being willfully ignorant of everything that’s going on in ecology except what I myself am doing.

      For more on my views here, see this old related post.

      As to whether I’m exhibiting a failure of imagination here, all I can say is that it’s not just the failure of my own imagination. Nobody has yet given me a compelling argument as to how things could be different, or why we’d want them to be. We all need filters, and most of us want those filters to work on the basis of semi-subjective things like “interest”, “importance”, not just “technical soundness”.

  8. Pingback: Friday links: transparency in research, and more | Dynamic Ecology

  9. Pingback: Selective journals vs. social networks: alternative ways of filtering the literature, or po-tay-to, po-tah-to? | Dynamic Ecology

  10. Pingback: Friday links: women in academia (then & now), cicada personal ads, and more | Dynamic Ecology

  11. Pingback: Post-publication review is here to stay–for the scientific 1% | Dynamic Ecology

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s