Probably most instructors have to deal with plagiarism at some point, and all have to be prepared to. Especially nowadays, because technological advances have made it easy for students to copy the work of others. But technological advances also have made it easier to detect potential cases of plagiarism. I’ve been doing a bit of background research on text matching software and its use in detecting and deterring plagiarism. I wanted to share what I’ve found so far, and invite you to share your own experiences with such software.
What’s text matching software?
Text matching software compares electronically submitted assignments against a database of documents. Depending on the software, this database might include some or all of: other assignments, webpages, and other documents like scientific journal articles. Note that I said “text matching” and not “plagiarism detection”, since no software can distinguish between, e.g., properly cited quotations and plagiarism. Some packages can be customized in various ways, for instance by letting you choose the parameters governing the behavior of the text matching algorithm.
What text matching software is available?
Some commonly used text matching software packages (not an exhaustive list):
- Turnitin. Proprietary, costs money. Compares submitted papers to a large (many billions of items) database of webpages and other documents, including all assignments previously submitted for checking.
- SafeAssign. Proprietary, costs money. Offered by course management software company Blackboard. Compares submitted papers to a large database of webpages, the ProQuest ABI/Inform database of scholarly papers, and assignments previously submitted for checking.
- Wcopyfind. Free, open source software from Louis Bloomfield, a physicist at the University of Virginia. Just runs on your hard drive. Compares submitted papers to one another, and to any URLs you specify. You can adjust all of the (many) parameters controlling the matching algorithm. It’s been around for several years, is very fast, and is frequently updated. Bloomfield was prompted to write it when he encountered high levels of plagiarism in his intro physics course.
- Viper. Proprietary, free. Says it compares submitted papers to a large database of documents, which doesn’t seem to include documents previously submitted for checking.
- eTBLAST. Free web-based service from the Virginia Bioinformatics Institute. Compares a submitted chunk of text to your choice of one of several mostly-biomedical databases (e.g., MEDLINE), plus a few publicly-accessible websites (e.g., ArXiv, Wikipedia).
- MOSS (Measure of Software Similarity). Free web-based service for detecting similarities among submitted computer programs. Compares the submitted programs to one another, not to a database (that’s my understanding, anyway). Registration required. Developed by a Stanford prof in 1994, continually updated since. Works with many different programming languages, though not R. Here is a list of other computer program matching tools (not sure if it’s up to date).
- Google and other general-purpose search engines. Plugging chunks of text into a search engine sometimes will identify webpages or online documents containing the same or similar text.
- Here’s a list of various other text matching tools. Not sure if it’s up to date.
I’m an instructor. What issues should I be aware of if I’m planning to use text matching software?
- Does your department or university have a policy on the use of such software? Universities that have a site license for Turnitin or some other proprietary system generally have policies on its use. Google “text matching software policy”, without the quotes, to find various examples. Obviously, if your university has a policy, you should follow it. Universities that don’t have site licenses for a specific software package usually don’t have any policies on text matching software. Which doesn’t mean you should just feel free to use whatever software package you want however you want. I suggest at least asking the advice of colleagues, and probably an administrator responsible for dealing with student academic misconduct. You need to make sure that whatever you do is consistent with the university’s existing policies and procedures regarding academic misconduct.
- Different software packages do different things, which affects their appropriate use. For instance, software like Turnitin uploads submitted documents to a database that’s owned by the software company, and that might be located in another jurisdiction than you are, which may raise issues of privacy and intellectual property. That’s why universities that use such software often have policies allowing students to opt out in favor of some alternative means of demonstrating that their work wasn’t plagiarized (and not such an onerous alternative that it effectively forces the students to “consent” to use of the software). In contrast, if you use Wcopyfind or MOSS to compare submitted assignments to one another, that doesn’t raise any privacy or intellectual property issues that I can see. All you’re doing in that case is speeding up comparisons that you could in principle do by hand.
- Text matching software isn’t a substitute for making sure students know what constitutes plagiarism, why it’s wrong, and that you take it seriously.
- You need to make sure you know how to use the software and interpret its output (which usually is very easy, from what I understand).
- It’s a bad idea to just blindly rely on any software package rather than using it to flag potential cases of plagiarism for your inspection.
- Text matching software can struggle with paraphrased material. Though from what I’ve heard anecdotally, students who plagiarize their assignments rarely go to the trouble of paraphrasing the copied material so extensively that the text matching software can’t detect the copying. If you can customize the search parameters, as with Wcopyfind, you can set them so as to maximize your chances of detecting paraphrased material, perhaps at some cost to speed.
- Text matching software won’t detect if a student had someone else do their assignment for them. (As an aside, it’s my anecdotal impression that essay writing services produce poor essays, but the students purchasing them mostly don’t care because their only goal is to pass the class, not to get a high mark.)
- In order to decide which software to use, you might want to think about how students are most likely to plagiarize in the courses you teach. If they’re most likely to copy one another, then arguably there’s not much value in paying for something like Turnitin, as compared to just using something like Wcopyfind.
- Do you plan to use the software as a deterrent? For instance, by announcing to the class that you’ll be using text matching software, or even announcing what software you’ll be using and how? Note that it’s not clear that text matching software is effective as a deterrent. Rigorous independent studies seem to be scarce, and I haven’t found any that report big drops in plagiarism following adoption of text matching software. That could be for various reasons. Anecdotally, many cases of plagiarism (quite possibly the majority) are from panicked students, or students who aren’t clear on what constitutes plagiarism. Others are from students who figure they have little to lose because they think (possibly incorrectly) that the penalty for plagiarism is fairly minor (say, just a zero on the assignment, with no indication of misconduct on their transcript). None of those categories of students will be deterred by text matching software. And unless the software is used university-wide (and I mean actually used, not just that there’s an option for instructors to use it), students who plan to plagiarize may just drop classes that use text matching software in favor of classes that don’t. Don’t laugh–the University of Alberta surveyed their students on academic integrity matters a few years ago and students reported that this is common (though of course, whether students answer such surveys honestly is a good question).
- Do you plan to routinely check all student assignments? Routinely checking all assignments arguably is fair–there’s no risk that anyone will feel themselves to be singled out. It minimizes the odds that you’ll miss any cases of plagiarism. And it maximizes the deterrent if you’re using the software as a deterrent. But it might make all students feel like they’re under suspicion, and that they’re at risk of being charged with plagiarism even if just a few stray words coincidentally match some other document. And if your university has an honor code, routine use of text matching software risks undermining the honor code, because what’s the point of an honor code if you’re not going to trust the students to obey it? (On the other hand, if the honor code is already being widely violated, then arguably there’s nothing left for text matching software to undermine.) One way to deal with this is to emphasize to the students that routine use of text matching software is to protect the large majority of students who are honest, and keep the rare dishonest students from getting a leg up on the many honest ones.
- Alternatively, if you’re not going to routinely check all students assignments, how will you decide what assignments to check? A random sample? Only if there’s some other grounds for suspicion? And if so, what grounds?
- What are the pluses and minuses of other ways of achieving the same goals? For instance, one way to minimize plagiarism is to write new assignments every time you teach the course. Unfortunately, that’s time-consuming, and doesn’t address the common problem of students in the same course copying from one another. Another approach is to only mark the students on exams and other assignments that they complete in class with you or another observer present. But that rules out many pedagogically-valuable assignments. Or you could go exclusively with project-type assignments that can’t easily be plagiarized, like having students give in-class presentations. But that’s not feasible except in small classes.
In the comments, please share your own experiences with text matching software (as both student and instructor) and relevant links.