Citizen science and data quality (guest post)

Note from Jeremy: This is a guest post by Margaret Kosmala, a postdoc in Organismic and Evolutionary Biology at Harvard University and co-founder of citizen science projects Snapshot Serengeti and Season Spotter.

There’s no doubt about it: citizen science is a growing field. In the past two years, three major citizen science associations have been founded, an international citizen science conference was held, a new citizen science journal is on the horizon, and a new cross-disciplinary online citizen science journal has launched. Aggregator SciStarter and citizen science platform Zooniverse have recorded a linear – or faster than linear – increase in the number of citizen science projects and participants.

But before we go any further, a little pre-survey, if you don’t mind:

I ask these questions because while those of us directly involved in citizen science are excited about the potential of citizen science for conducting science, the broader ecology field seems a bit skeptical. And I wonder how extensive this skepticism is and how it varies with career stage. (With the caveat, of course, that readers of Dynamic Ecology are not guaranteed to be a representative sample of any group.)

A couple years ago, I blogged here that a project I was involved with had gotten the following feedback from a proposal looking for continued funding:

The use of citizen-scientists to provide meaningful and accurate data will depend on their training. It’s unclear how quality of data generated from the citizen-identified animals will be ensured.

Since then, I have heard these skepticisms over and over – from colleagues and from strangers who have contacted me to ask how to launch a citizen science project. An NSF officer is on the record for saying citizen science is more valuable as an education tool than for actually doing science. I heard the frustrations of citizen science project managers again most recently at this summer’s ESA meeting. And it’s starting to get a bit old. Why are reviewers so against citizen science as method for doing science?

I think the proximate cause is a distrust of citizen science data. But I think the ultimate cause is unconscious biases held by scientists about who we (scientists) are and what we can do and who they (the “general public”) are and what they can do. Here are some things that I think cause professional scientists to distrust citizen science data:

Belief: Professional ecologists collect high-quality data. After all, we’ve all had to go through many years of education, have had the experience of conducting research under the watchful eye of established scientists, and are deeply invested in our data.

Truth: On average, I think professional ecologists do collect high-quality data. But, let’s not confuse high-quality with perfect. Like all humans, scientists are error-prone. Sometimes we get tired, sometimes we lose focus, and sometimes we simply press the wrong button. Additionally, many types of ecological data require a fair amount of judgment that varies among data-collectors. Visually conducting percentage cover estimates is one example. In wildlife studies, using a pair of observers to count wildlife over transects is the norm, as it’s recognized that there will be variation among observers. And the statistics we use in ecology accepts that there will be some degree of measurement error, because expert-collected data is variable and imperfect.

Belief: Data in scientific studies is collected by professional scientists. Think about it: you agree to review a paper, and the methods state that “we measured diameter at breast height”. Do you question who “we” is? Probably not. Do you ask what training the measurers had? Almost certainly not.

Truth: A lot of ecological data used in scientific studies is collected by undergrads, graduate students, and technicians. These data collectors have some amount of experience and training, but if the assumption is that ecological data quality is high because professional scientists have collected it, then this assumption is violated on a regular basis.

Belief: Members of the general public (hereafter “volunteers”) cannot collect data as well as professional scientists can. Well, maybe they can come close if they have lots and lots of training. But most don’t have that training. Also, they aren’t invested in the project they way the professionals are. And gosh darn it, I spent 7 years getting my Ph.D. so that has to be worth something, right?

Truth: Ecological data collection often involves very simple measurements that can be accurately performed by schoolchildren. Things like counting and using a tape measure are not expert-level skills. There are dozens of citizen science case studies that measure volunteer data accuracy against expert data accuracy. And for simple tasks, volunteers perform as well as experts (and neither is perfect). Importantly, volunteers may need training on how to record their observations in the context of a particular project. But there is no skills training needed for many types of data collection and analysis. There is a point where training starts to matter, though. Asking untrained volunteers to differentiate among well-known organisms (e.g. elephant vs. giraffe; needle-leaf tree vs. broadleaf tree; bee vs. butterfly) yields good data. Asking volunteers to differentiate among less-well-known or cryptic organisms requires some training to get good data. And yet this training need not be extensive. A targeted guide to the differences between, say Grant’s gazelle vs. Thomson’s gazelle or red maple vs. sugar maple or honey bee vs. carpenter bee may be all that is required. That is, much ecological research involves collecting simple measurements, and much of the rest involves collecting easily-learned measurements.

Belief: Volunteers are members of the general public.

Truth: Citizen science volunteers come from a highly skewed fraction of the general public. They tend to be younger (because many projects target K12), better-educated, and well off (they have free time) than the general public. Except for kids trapped in classrooms, citizen scientists already have a predilection for science. And although they are technically laymen, many of them are expert non-professionals. Consider the many highly skilled birders who take part in avian citizen science projects. We have also found a hard-core following of African mammal enthusiasts in our Snapshot Serengeti project, many of whom have been to the Serengeti in real life or have grown up in eastern Africa and find the animals in our project to be as familiar as many North Americans find squirrels and raccoons.

Belief: Volunteers are not as motivated to pay attention to data quality as scientists and so are sloppy.

Truth: Volunteers are often highly motivated. After all, if you are doing something in your spare time, you probably care about it. A survey of volunteer motivations in an astronomy project found that contribution was the main motivator for people to do the project. That is, volunteers want to work on citizen science projects because they want to contribute to scientific knowledge. The awesome eBird project out of the Cornell Lab of Ornithology allows birders to collect data in two ways: recording a few target species or completing entire checklists. They found that when they informed birders that the checklists method yielded better and more usable data than recording target species, a large number of birders switched to the better protocol. The also-awesome Monarch Larva Monitoring Project based out of the University of Minnesota found that highly trained and engaged volunteers collected better data than paid technicians. There is additional data from the social sciences supporting the idea that paying people to perform a charitable task reduces people’s interest in doing that task. Our best data gatherers may, in fact, be unpaid volunteers devoted to the progression of science.

Belief: Because of all the above reasons, data collected by volunteers has high variability and bias. So we can’t use it for science.

Truth: While some projects do, in fact, collect uncontrolled data, many citizen science projects have procedures in place to record factors that influence accuracy and bias. The fact is that a lot of ecological data is riddled with unexplained variation and bias, no matter who collects it. This has led to a great toolbox of statistical techniques that allow us to measure and correct for such variation and bias. Problems such as non-standard effort, unbalanced designs, under-detection of organisms, and so forth are not unique to citizen science and can all be addressed statistically. Tomas Bird and colleagues have a great paper on this if you want to read more.

So next time you are reviewing a citizen science proposal or manuscript, pause a moment and consider that the volunteers creating the data may be quite skilled. Or perhaps the tasks the volunteers perform are easy and therefore there is unlikely to be much error in the data. Or perhaps the tasks are not so simple and the volunteers were trained. If so, look to see what training the volunteers got; there’s no reason to think that trained volunteers will do a worse job than trained student interns or paid technicians. Or perhaps the project managers are using statistical techniques to control for quality and account for bias in the data. There are many ways to ensure the integrity of data collected by volunteers. A little more respect for citizen science data overall could go a long way towards realizing the potential of this field beyond education and outreach.

26 thoughts on “Citizen science and data quality (guest post)

    • If you would like to fill out the survey, I suggest just picking whichever career stage seems most appropriate. It’s not a scientific survey. And the most important bits are your views on citizen science.

    • Ha! I had the opposite reaction to that question. Past Dynamic Ecology polls have asked you if you were a grad student, post doc or Faculty. This poll seemed pretty inclusive by comparison. (To be fair, past polls offered an “other” choice, which may be what you were looking for).

  1. I selected that I’ve never been involved with citizen science, but that I’ve read about it. The truth is that I’ve never lead or volunteered for a citizen science project, but I’m a huge fan of and have used data from that source in publications.

    I like this as a nice, clear demarcation of the ways people’s perceptions about how data is collected and how it actually IS collected. The point about people doing it in the spare time means that they’re motivated is a good one. I’d probably trust a layperson volunteering for a citizen science project to collect data than I would an undergrad motivated only by credit (distinct from an undergrad motivated by wanting to do science, either to pursue as a field or because they love it; undergrads can be awesome).

  2. Connected to the belief that “Data in scientific studies is collected by professional scientists” is the fact that citizen science has been going on for far longer than the use of the term citizen science. For example, if you look back at many biogeographical or macroecological studies on British plants in the 80s or 90s, many of those data will have been collected by expert amateurs working with the main British botanical recording scheme. Thus, many influential studies have actually used what is now known as ‘citizen science’ data without actually noting the fact, or indeed many other people noticing. See, for example, or

  3. Just happened to stumble across this: giving people a smartphone app to help find a rare insect in the UK:

    One unusual (?) aspect of this particular project is that the citizens don’t really have to do anything. They just have to run the app; the app does all the work. Sounds like it’s not designed that way because the scientists didn’t trust the citizens, though. It was just sensible for this particular project to write an app that would do all the work. A bit reminiscent of SETI@Home.

    • That’s cool. Yeah, distributed volunteer computing projects are also on the increase. It’s neat to see mobile device sensors being used this way.

  4. I’m surprised that some scientists still question the quality of the data, especially species ids, since there are many expert non scientist enthusiasts who do this sort of stuff for fun.

    I’m betting too that some laymen actually have hours in the field doing, say, bird ids, because they don’t have to bother with all the other work that goes into a research project.

  5. Question, Margaret: As someone experienced with citizen science, what are the biggest things *you* worry about? What do *you* think are the most common or serious problems or limitations of citizen science as an approach?

    I ask just because I’m always interested in what the people using any given approach see as the most common or biggest problems with their approach, vs. what “outsiders” see as the most common or biggest problems. Often (not always, but often), the concerns of outsiders reflect ignorance or misunderstanding of the approach, and so aren’t real concerns. But that same ignorance or misunderstanding also means that outsiders often aren’t aware of the *real* issues with the approach.

    With microcosms for instance, I can tell you that outsiders’ blanket concerns about the approach (as distinct from valid but narrow concerns about specific microcosm experiments) are all easily rebutted by anyone who does microcosm work for a living. Having to rebut those groundless concerns (again!) is boring. But I’ve had very interesting discussions with other microcosmologists about the most common *real* (not purported) problems with microcosm studies.

    • That’s a really good question. I think my biggest concern is the large increase in projects and the lack of good information about *all the things* that are needed to make it a good project. Right now you really ought to have at least (1) a domain expert; (2) a data/statistics/coding person; and (3) a person person (education, social sci or psych sci). Very few project do this. It’s hard. It’s interdisciplinary. It’s expensive. So I think a lot of projects really struggle, because the learning curve is steep if you’re missing one of these components. What worries me the most is citizen science getting the reputation among potential volunteers for not actually doing *science*. If a project isn’t well-designed, then the data coming out of it will be unusable. And if the data can’t be used, you’ve just wasted a lot of people’s time and destroyed their good will. I think this is true whether project objectives are science-oriented or education/outreach-oriented.

      As for the technique itself, I really see citizen science as a tool rather than an approach. So I don’t have a lot of concern with the method itself. (Do you have concerns about pipettes or scales or microscopes?) But I think people using citizen science data (which is not limited to the projects producing the data — see above comment) need to be very careful to think about how the data was gathered and what biases it might have. There have been some nice papers about these sorts of things out of the Cornell Lab of Ornithology, using the very large avian datasets that come out of their projects (e.g. bias of birders tending to sample close to roads). But I haven’t seen as much care taken with some other datasets. I’ll point out, though, that this concern isn’t unique to citizen science. Rather, it’s about using other people’s data and large datasets more generally. See, for example, the no-biodiversity-loss studies ( and the criticism that sampling sites were heavily biased and therefore the inference invalid ( (Did you go to these talks this summer?)

      • “As for the technique itself, I really see citizen science as a tool rather than an approach. So I don’t have a lot of concern with the method itself. (Do you have concerns about pipettes or scales or microscopes?) ”

        Interesting analogy! Will have to mull that over.

        ” See, for example, the no-biodiversity-loss studies ( and the criticism that sampling sites were heavily biased and therefore the inference invalid ”

        I absolutely see what you mean, but that particular example isn’t a good one, in my view. The folks who did those studies were very careful, and are getting ripped for mostly political reasons. See here, and the associated comment thread:

        That thread isn’t one of my finest hours (I got upset and let it show), but I think it makes it clear enough that the fight here isn’t really a conventional scientific discussion of a technical issue like sampling bias.

      • A couple of further thoughts re: having concerns about one’s instruments:

        -recall that when the telescope and microscope were invented, some people at the time thought they were showing false images. A good example of people doubting instruments because they didn’t understand them, or because of some overarching philosophical stance (in the case of microscopes and telescopes, those who dismissed them often took a philosophical stance favoring “reason” over fallible, imperfect “observation”).

        -people *do* often have concerns about instruments like pipettes and scales. That’s why they get them calibrated periodically. But of course, you calibrate your scale not to assuage some kind of blanket concern about whether scales can ever work or ever contribute to “real” science or whatever. You calibrate your scale because you have a quite narrow concern about specific factors that would reduce the scale’s accuracy. The analogy to at least some citizen science projects is obvious. I’m thinking for instance of Snapshot Serengeti calibrating citizen ID’s of photos with expert ID’s.

      • > I absolutely see what you mean, but that particular example isn’t a good one, in my view.

        Fair enough. Seems you’re way deeper into this particular example than I am; I just saw the talks at ESA and missed this particular DE post (due to end-of-pregnancy exhaustion, judging from the date on it). Glad you got my point, regardless of how appropriate the example was or not.

        > A couple of further thoughts re: having concerns about one’s instruments:

        Yes, I absolutely agree with your points. I’m writing a paper just now, in fact, that talks about getting good quality data from citizen science though calibration and validation (and other methods). I like your telescope and microscope examples as a way for me to think about how other people might be thinking about citizen science from a tool perspective. Thanks!

  6. I don’t doubt the quality of the data collected. I do think there has been in shift in citizen sci (ok, several) recently though — where professional scientists view them as necessary but uncompensated parts of their research methods (very different from outreach to citizens, or from opportunistically using citizen-collected data they would otherwise be doing as hobbies – like birding or wildflower drawings/presses that have been done for centuries).

    So a few questions (was actually debating this with STS scholar partner on the long drive back from Thanksgiving):
    – If scientists rely on this data and “citizens” are actually small pool of qualified people, isn’t this just not compensating technicians? If not, where is the line?
    – How do these kind of projects (where citizens collect data but are otherwise not involved or communicated with) count as “broader impacts?” Sounds like cost-cutting, not outreach.
    – Is relying on trained (but not by you/your grant $) unpaid labor for data collection somehow better than unpaid interns (they are highly motivated for job experience and skill development)?

    I’m coming from a perspective of trying to think about training the next generation of ecology and STEM technicians, researchers, and staff — and making that inclusive. Is citizen science helping that or hindering (or neither)?

    • These are really good questions, and there’s a lot to be said about probing the ethics involved in citizen science, crowdsourcing, and frankly much of how science is done in general.

      > If scientists rely on this data and “citizens” are actually small pool of qualified people, isn’t this just not compensating technicians? If not, where is the line?

      I think that there’s a solid line between paid and unpaid. If people are voluntarily saying, ‘yes, I’d like to participate,’ then it’s an altruistic act, like giving money to a charity. Technicians are paid; it’s a job, with the concurrent expectations and responsibilities on both sides of the deal. I see these as two very different types of acts. (Questions back at you: is there a moral problem in letting your mom/dad/spouse/sibling/friend help out with field work, unpaid? I know many, many folks who have done this, myself included. Is it different if it’s someone you know vs. someone you don’t?)

      Blurrier line: what about forcing K12 students to do work on your project? (Many citizen science projects engage K12 students.) These kids don’t have a choice. The presumption is that doing the work is educational. But if it’s also contributing to your project, is that good (kids get to do *real* science!) or bad (kids are being exploited)? I’m uncomfortable with it, but see it as fairly benign, since projects are generally wrapped in educational materials and the benefit is *mostly* for the kids.

      Blurrier line still: what if you pay people a very small amount (e.g. Mechanical Turk) to do citizen science tasks? I think you get into some murky ethics here, and others have written about it. But it’s also a fact of life that the dynamics around pay and services is changing in all sorts of fields — art and design, journalism (including science journalism), hospitality, transportation. The internet is transforming how people trade money for services. There are winners and losers in massive transformations like these, and which side you fall on tints your view of whether the transformation is “good” or “bad”. My opinions on all this haven’t firmed up yet.

      > How do these kind of projects (where citizens collect data but are otherwise not involved or communicated with) count as “broader impacts?” Sounds like cost-cutting, not outreach.

      To be quite honest, I personally think most “broader impacts” are fairly worthless, at least in our field. The concept came about because the government wants a more scientifically literate public and plenty of people going into science (but not really ‘science’ generally — really engineering and computer science). Most broader impacts engage people (both students and adults) who already like science. What you really want to do is to reach and engage people with little exposure to science — but that is a much harder task. So I don’t think that citizen science does a worse job at “broader impacts” than speaking to classes at your (usually privileged) local school or mentoring undergraduates and grad students. That said, many many citizen science projects are initiated with educational goals in mind, either as a primary objective or as a co-objective. I think even when education is the primary objective, the datasets produced should be high-quality, otherwise you’ve wasted the volunteers’ time. As for ‘otherwise not … communicated with’ — the projects that I know of that have failed to communicate anything back to volunteers lose their volunteers and ultimately fail.

      > Is relying on trained (but not by you/your grant $) unpaid labor for data collection somehow better than unpaid interns (they are highly motivated for job experience and skill development)?

      Hmm… I have strong feelings about the unpaid interns thing. (It’s bad, bad! As you no doubt know, it’s bad, because it advantages people who are already privileged and disadvantages others.) I think it is (or can be) different, though. The unpaid interns are often motivated, yes, by the experience itself, but I’d suggest perhaps more so by the “proving” potential of the experience. That is, unpaid interns typically want (A) to be able to put the experience on their resume/CV and (B) a letter of reference. Large-scale citizen science projects offer neither opportunity. I’m not entirely convinced this is the right way to think about it, but I haven’t actually thought about this specific issue before. In the project I’ve been involved in longest, our volunteers tend to be older, many of them retired. So it’s not young people looking to gain skills for a competitive edge. I’ll think about it some more.

      > Is citizen science helping [inclusion] or hindering (or neither)?

      I’d say that the online nature of citizen science opens some doors that are otherwise not opened. In particular, I know of a couple volunteers who are older and housebound. They get a lot out of being able to contribute to science and socialize with the community surrounding the project. These people would not be able to be ‘technicians’ or ‘staff’ of any sort. So I’d say that citizen science can increase diversity in the age / stage-of-life / abledness / type-and-amount-of-contribution sorts of dimensions. I don’t think that most projects hit a lovely cross-section of society. Some may have specific reaches that I’m not aware of, though.

      • thanks for replying to them all! Great things to keep thinking through – I hadn’t thought about mechanical turks for science (or problems with it).

  7. I recall reading a paper that used citizen science data (bird counts) where they automatically excluded the people who reported the highest counts. (I don’t remember the amount, but it was something like the top 1%.) Something I’ve wondered since then but never looked into: is that sort of thing common?

    • Hard to comment on that example specifically, without knowing the details. Two of the big bird projects — eBird and Project FeederWatch — have automatic filters that alert volunteers if their counts seem high and ask them to verify (to avoid data entry errors like ’55 chickadees’ instead of ‘5 chickadees’). If the volunteers do verify, then the info goes to a regional expert who can ask for supporting evidence and/or determines the plausibility of the sighting for that place and time of year.

      More generally, in order to vet data, successful projects generally do some sort of quality control on their data, just as you would with your lab data. What do you do when you have a weird outlier in your data? Probably examine it as best you could. What would you do if you had 5,000 weird outliers in a dataset with a million data points? It’s a big(gish) data problem. With Snapshot Serengeti, we use summary statistics to gauge which images have the most uncertain identifications (based on disagreement among volunteers), and exclude those images in many analyses.

      I guess to answer your question, yes, there is additional post-processing that can increase data quality. And I’d like to think it’s typically done thoughtfully and reasonably, just as I like to think that data cleaning is properly done for all published ecology papers. (Note that we professionals don’t always report the sorts of data cleaning we do on self-gathered data, though we should.)

  8. One paper from Applied Ecology in 2014 used citizen science in a particularly interesting way ( As an undergrad working on the same system, I was incredibly jealous of the vast amounts of data they were able to gather. After working in the field for 6+ hours a day all summer, we were hard pressed to cover a county-sized area with one of the bigger labs (personnel wise) at the university. This paper had data for an entire state!

    The point is that citizen science is great for scaling your project when other options are not available. Would the authors have preferred to have teams of highly trained specialists in each zip code? Probably. Was that going to happen for a project on a specific weed-insect interaction that only exists in one region of North America? Probably not.

    People make sacrifices when collecting their data all the time. No sampling transect can create a complete picture of what is happening. Until a better (read: cheaper) alternative becomes available (and who knows, these drones sure seem to be getting a lot of press lately) citizen science remains the best option for scaling up your project.

  9. Pingback: Is citizen science about science or outreach? | Dynamic Ecology

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s