Academics generate a lot of intellectual property (IP for short). Arguably it is the main thing we do aside from teaching. And the IP landscape is changing rapidly both in and out of academia. This is yet-another-thing academics are supposed to be excellent at without any formal training. I don’t have extensive training, but I spent 10 years working in the software world and often was the lead business person working with lawyers to negotiate software contracts. So I have thought about these topics and how they are evolving. They seem to be evolving in some directions that don’t make sense to me. So I thought I would write a brief guide to the issues and raise some of the concerns I have.
NB: I am not a lawyer and have no formal training. Anybody who has questions of real consequences should consult a professional. I take no responsibility for people who substitute my opinions in this blog in lieu of professional advice.
Broadly intellectual property is any idea that is the fruit of ones (an individual or group) labor that gives the creators some rights in controlling its future use. Society clearly benefits from innovation and creativity, and therefore has a vested interest in ensuring that such behaviors are rewarded while not stifling future innovation. Laws have provided multiple ways to protect multiple different types of intellectual property.
First, a side note. I want to recognize that many people that think academics shouldn’t do anything to protect their intellectual property. They sometimes say it should be “free as in beer” meaning it doesn’t cost anything and “free as in speech” meaning it doesn’t have any restrictions on it. That’s a valid point of view. Its not really my point to discuss whether that view is right or wrong. My point here is that recognizing we are far from a world where all academics and the institutions they interact with believe in completely free, academics need to know and use legal properties to the ends they desire. And that includes to the ends of freeing data. In a world where writing something gives you a copyright of it (we live in that world), even if all you want to do is share that property with no limits, you need to know how to do that.
So with that out of the way, we can return to the types of intellectual property protection (for whatever end goal). We can construct a simple binary key to the major types of intellectual property protection:
1b) Legal – based in the laws of countries. Can be used to sue for financial remuneration if violated.
2b) Public/registered – protected by making public with notice that it is protected (and in some cases registration with a government body)
3b) Tangible items – patent law
3c) Creative works – copyright law
Non-copyright forms of IP
Let me deal first with the two cases that don’t apply to academia often. Trade-secrets involve protecting IP by keeping it secret. This normally requires contracts like non-disclosure and non-compete agreements to keep employees from sharing the secrets. The recipe for Coca-Cola is the most famous example. I suppose one could argue that research goes through a trade-secret phase when it is in development (although we rarely use formal legal contracts to enforce the secret nature), but the ultimate goal of academia is to make things public. Trademarks (names and logos of products) are symbols under which one does business. One could imagine wanting the name of a software package to be trademarked. In theory simply doing business under a name creates somewhat of a legally enforceable trademark. In practice, trademarks really need to be registered with the government. And trademark registration has to be done with each country. To get a world-wide trademark would require filing with >100 countries, with fees and paperwork at very stage and thousands of dollars. Not really worth it.
Next the case that only applies occasionally to academia: patents. Patents were originally limited to physically tangible inventions. The notion was that one would file a patent claim with the patent office of a country. And if the patent office agreed it was suitably novel, exclusive rights to the notion for some period of time, often 20 years. Anybody else who wants to use it has to pay licensing fees. The notion of what can be patented (the physical tangible part) has been stretched (by additions to the laws) in recent years to include software concepts (especially algorithms and user interface features) and genes. Patents in one country extend to other countries more than trademarks but the law does still vary from country to country. And even if your national patent office gives you a patent, anybody who wants to use that idea can challenge whether it is genuinely novel and have the patent thrown out rather than pay you a fee. So although in theory anybody can file a patent, in practice, most successful patents are filed by lawyers and cost a lot of money. This is especially murky for software. I am listed as an inventor on a patent that was awarded, although I have to tell you I’m pretty sure other people could have thought to combine the two already existing pieces we combined, but its never been tested. So patents really only apply to areas where substantial money can be made (e.g. biomedical or computer). If you think you have an idea that could benefit from a patent, talk to your university IP office.
This leaves the one category that most academics deal with: copyrights. A copyright applies to the expression of an original creative work. As such it does not apply to the idea underlying a creative work. Anybody can write a poem about love. The notion of love is not copyrightable. The exact specific choice and sequence of words is what is copyrightable. And they have to be original. “Rose are red, violets are blue” is not copyrightable. This has some important limitations. It is why a song title is not copyrightable, only the words of the song. And in general, ideas CANNOT be copyrighted (that is what patents are for). Only creative expressions are copyrighted.
How do you get a copyright? Technically all you need to do is create a work and express it in tangible form (written, recorded, performed, etc.). Noticeably better and not much more work, you can put it in the public eye with the copyright symbol your name and a year. Just putting “(c) 2018 Brian McGill” at the top of your computer code is all it takes for pretty strong rights. If you really want you can register it, but unlike trademarks, you don’t really need to register it. Of course even when you have a copyright, you still have to protect those rights.
If you have a copyright you retain most rights to the reproduction of that work for a long time (exact length varies by country but often 50+ years from death of author). This means nobody can reproduce that work without getting your permission (for which you might ask for money among other considerations). There are a few exceptions known as fair use (people can excerpt a couple of hundred words for critique and review, the copyrighted work can be used in satire and cultural commentary)
So how does this apply to academia:
- Writing – everything you write is clearly original and copyrightable. Although if you are going to publish it with a non-open source journal you will usually have to transfer the copyright to the journal (but see e.g. Evolutionary Ecology Research). Details at open-source journals vary but at a minimum you have to grant some license to make the work freely available.
- Photographs – photographs are equally clearly and strongly protected by copyrights
- Line drawings – a gray area – the expression of the drawing is protected, but the idea of a drawing is not. Thus textbook publishers often choose to redraw figures published under copyright rather than seek and pay for permissions for them. You would be surprised how much of a graph counts as “idea” rather than expression. You can trace a graph, change some line styles & colors, tweak a scale or labels and that is not violating the expression of the original figure.
- Computer code – expression is important in code (“if x<y then A else B” is really importantly different from “if x<=y then B else A”) so code is well protected by copyright. What that code does including its algorithms and user interface aspects are not. Whether an API is copyrightable is a grey-zone on which some really big cases have hinged. Most commercial software you use is primarily protected by copyright (whether it is Microsoft or Apple or an open licnese like Linux).
- Data – a big enough can of worms I have a separate section on it.
So just by creating the work and putting a copyright line in it you have a copyright. Actually you have a copyright even if you don’t do anything to claim it. So what do you do with that? Well, if you want to publish it in a journal you will often transfer your copyright to the publisher. Otherwise, you can sell copies if anybody will buy copies. But most often academics want to get their work out there (we’re more in pursuit of fame/higher knowledge than money). And if it is copyrighted, nobody can copy it without your permission. Those restrictions rest on the original copyright which pretty much lets you do anything you want. And if what you want is to share it and let others share it but place restrictions on their use, then you can do that. It is called a license. It is really no different than the End-User License Agreement (EULA) that you implicitly or explicitly agree to when you install commercial software. Every time you open Microsoft Windows or Apple’s MacOS or even a version of Linux you are using copyrighted software under a user license. Of course a key distinction is that Windows and MacOS are not open licenses, while Linux is. There are a number of versions of open licenses out there, but I am going to walk through what has quickly emerged as the most common open license – creative commons (CC) licenses:
- CC0 – this basically grants people the right to do anything they want with your piece of work. So why bother with CC0? Well for one thing under copyright law nobody has a right to make copies unless you grant that right. This is an easy way to do that. Additionally and importantly, CC0 contains legally enforceable language that there is no warranty (guarantee of accuracy or adequacy) of the material provided. Moreover, CC0 keeps a few small rights like the right to create patents off the work to the author. The Berkley BSD and MIT licenses are similarly unrestricted licenses used in the software world.
- CC-BY – this is the next level up – it lets anybody do anything they want with it SO LONG AS they acknowledge you as the creator. To make this enforceable in the 2nd generation (somebody who takes a copy from somebody who took a copy from you) everybody who copies the work must also keep the CC-BY license in the work. But they can do anything else. If somebody modifies your work they have to distinguish the changes they added (that is part of the attribution).
- CC-ND (No-derivative) – here you grant the right to only make verbatim copies – people cannot copy and modify the work. If somebody wants to share a modified version (with a few minor exceptions like change size or typeface allowed), they have to ask your permission. This is really most important to artists who are attached to the exact expression and want to protect the integrity of their work. While I can understand why this could be tempting to a scientist, the CC-BY already forces a users to attribute what is yours and what is theirs (they cannot attribute their own introduced mistakes to you). And the inability to create derivative works is quite limiting to scientific advance. I recommend against the ND tag in science.
- CC-SA (share-alike) – this is the creative commons version of a “copyleft” license. This is a play on the word copyright. It is like Kurt Vonneguts Ice-Nine which is a form of ice that causes any water it touches to immediately freeze. Or King Midas who causes anything he touches to turn to gold. A copyleft license means that anything it touches turns to open access. One can use the copyrighted work to do anything you want, but only if one then reshares it again under a CC-SA license. Most especially note that this applies to when your work is included in a larger work. The larger work must also be shared under CC-SA. Hence the Ice-nine like properties. It strongly enforces openness. But it prevents a lot of scientifically valid uses of your work. In particular, if you have a figure shared under SA, nobody can include even one figure form your work in another work they are unable to put under a CC-SA license (you have just prohibited your figure from being used in any review article or textbook that is not open source). Arguably it even prevents posting your work on a class website if that website is behind a password login. This is a very potent restriction with far reaching consequences. Use it intentionally and thoughtfully
- CC-NC (no-commercial) – this prohibits primarily commercial uses. What are commercial uses? Well they involve money and your work changing hands. But beyond that nobody knows. A lot of people think this means it can be used for educational and non-profit purposes or that it only prohibits reuse by for-profit corporations. None of those statements are true. Worse, CC intentionally makes this clause vague. And even conducted a survey to find out what people think it means. An example I am running into. I am writing a textbook on macroecology. As author I am responsible for obtaining use permissions on all figures. It is under contract with Oxford University Press who is non-profit and educationally oriented. But any figure I use will go in a book that is sold for money and I will be paid (albeit a pittance in hourly rates even for the whole book and for sure for the fraction attributable to one figure in the book). But getting rich is not the primary purpose for either me or OUP. It is secondary to covering costs of sharing. Is that a commercial use? Nobody knows. CC doesn’t say. And this has never been meaningfully tested in a court of law to date. While the appeal is obvious, the NC clause is vague and probably doesn’t do what you think, and I recommend against it.
Note that there are various combinations of these like CC-BY-SA-NC-ND. Not all combinations are possible. Check the Creative Commons website for details.
All of these licenses are grants of rights to others. You as the author retain all the rights of the original copyright (unless you have assigned the copyright to somebody else). So in particular, if you publish something under CC-SA and somebody else wants to use it without the SA restriction, they can come back to you and you can grant them a different license (e.g. permission to reproduce a single figure in a copyrighted work, or reproduce the whole work for a fee). By the same token, the CC license does not remove the original fair-use permissions (e.g. quoting small sections) under the original copyright. The only way you lose your rights are if you: a) outright transfer the copyright (which you do when you publish in most closed journals and often when you sign an employment contract although not at many universities) or b) you offer an exclusive right to somebody else (e.g. my book contract I retain the copyright but I give Oxford Press an exclusive right to publish in exchange for their investment in typesetting etc).
The bottom line is that if you want to share something, it is important to make that permission explicit. And CC0 and CC-BY are great ways to do this. The other restrictions (-SA, -ND, -NC) are a little more problematic in the scientific world in my opinion. If you want to be a radically open sharer you can put an -SA on, but don’t be surprised if you have to deal with lots of request for a license without that restriction (assuming your work hits it big and lots of people want to reuse it). The -NC and -ND are problematic for science and should be used with caution in my opinion.
Copyrights work great for prose, computer code and photos. The arrangement of parts are integral to the expression. But as the line-drawing example shows, copyrights can be surprisingly limited in other contexts (copyrights don’t apply to ideas). What about datasets or databases? The answer is really complicated. You should probably hire a lawyer if you want to pursue this seriously. The short answer is that facts are not copyrightable. Compilations of facts may be protected under copyright to the degree their arrangement is novel, non-obvious and required investment. There are some famous cases that show the bar is pretty high. One is a company that copied a telephone directory published by somebody else. The changed the fontsize, and, well, alphabetical is an obvious, non-novel way to arrange them, and the actual name/number combos are just facts. Courts decided there was no copyright violation for somebody who copied a whole phone book! A list of references used in a paper is a database that is fairly non-obvious (it is specific to your intellectual line of argument). But a list of every reference published since 1990 with the keyword “coexistence” is probably not creative. In numerical datasets, the database schema and maybe the field names might be copyrightable. But not the contents in them – numbers are facts not creative expressions (the mathematician in me wants to argue …). This also differs drastically between the US and the EU. The EU recognizes “sui generis” database rights that last for 15 years (where sui generis means created on its own, i.e. not derived from copyright law). The US only recognizes copyright law and interprets copyright strictly so it applies to very little about a database. The EU “sui generis” rights don’t apply outside the EU and have yet to be really strongly staked down by test cases. So to say the law on protecting databases is vague and inconsistent and underwhelming is an understatement. If you want to protect something you might be able to do depending on what it is and where you live. But without consulting a lawyer, your best bet is that it is not protectable, and almost certainly not protectable world-wide.
And everything I said about the -NC, -ND and -SA restrictions for creative works applies in spades to a database. If you limit scientific reuse of facts, you are really hampering scientific progress. Which is why there is such little legal support for copyrights for databases even if you want them.
I often see people apply CC licenses to databases. Since a CC license is embedded in copyright law and databases are not copyrightable in much of the world, this is not a great idea. The newest CC 4.0 versions try to address databases, but given that it starts as a copyright license and the copyright status of databases is vague and nationally variable CC themselves are not really very clear on what their license says about databases. And CC themselves outright recommend against -NC and -ND restrictions on databases.
My own sugestion is that if you want to put a license on a dataset or database, use the open database license, not CC. ODL is also based on copyrights but is at least intended for databases. But given the limits to copyrights for data anyway and that those limits exist for the good reason that “locking up” a fact is detrimental to society and science, consider just going with an unrestricted license like CC0, MIT, Berkeley, ODL or CC-BY
Everything above is about legal approaches. Which means they are founded in national laws. It also means that they are enforced in the courts via lawsuits. All academic work exists in another domain too – the one that is based on a scientific code of ethics. The ethical property rights, if you will, in contrast to the more legally based intellectual property rights. The ethical and legal rights are a Venn diagram of two overlapping circles. While often times the two line up (e.g. a CC-BY license of openness with attribution closely matches many scientific code of ethics situations). But there are actions that are legal but not ethical in academic circles (taking something published under a CC-BY license by somebody else and then selling it for money) and actions that are ethical in science but possibly not legal (e.g. reusing a figure with attribution released under CC-BY-NC or CC-BY-SA in a textbook).
There is also the interesting dilemma that anything stronger than CC0 (i.e. including CC-BY and stronger licenses) have a “no additional restrictions” clause (and CC0 implicitly has this by giving away all rights). This means you cannot place additional legal restrictions on anybody you are grant rights to via the CC license. So what is the legal status of something I see all the time? – “this data is made available under a CC-BY license. Anybody who uses this data is required to contact the owner of the data and offer co-authorship”. Better hire a lawyer for that one! Of course the ethical implications are clear. I might not like their restriction, but as an ethical scientist I wouldn’t violate it. But I might wonder why they bothered putting the CC-BY in there. Were they trying to have their cake and eat it too? Or just unaware of the legal complexities? Or maybe its OK to put out there something that is not legally enforceable but ethically “enforceable”? More than anything to me it just seems like a logical contradiction – the ethical requirement just completely undid everything the CC-BY put legally in place. It just went form legally precise to legally vague. Am I allowed to make copies without offering co-authorship? To do analyses without co-authorship? Embed it in a larger dataset without co-authorship? And if I copy it, I have to copy the CC-BY license? Do I have to copy the ethical request if I copy the data? If you want co-authorship for people who do more than read your data (i.e. publish themselves with it), why not go find (or write) a license that grants usage conditional on co-authorship and be clean and direct and precise about your intentions? But other reasonable people will see it differently.
It is worth noting out of the last point, that is not really that hard to write your own license. If you don’t intend to follow the terms of a particular CC license, it is cleaner to write your own custom license. Just write your own license that claims your copyright and then says exactly what rights you grant to whom under what conditions for when and how long (and probably that you want attribution and that you want your license terms copied with the data if you are allowing copying of the data). Plain English is still legally binding! The license on using the BCI 50 ha data is a good example of this. The notice at the bottom of every Dynamic Ecology blog post is also a custom license. Writing your own license is a great way of collapsing the legal and ethical Venn diagrams into one circle (for at least your data or other creation). To an ethical scientist it won’t really matter if your terms are legally binding. You’ve just been clear on what you expect, and that is ethically “enforceable”. And to the extent you are entitled to a copyright, it is legally enforceable. Of course some journals expect standard open licenses (some even name a specific CC license). Dryad requires CC0. GitHub lets users pick from several very open licenses: MIT, GPL, or Apache (github even has a license chooser tool). If you put a product in one of these mandatory open license locations (using other peoples resources to share it) and then tack on a non-legally binding ethical requirement, is that request ethically binding? That will keep scientists debating for a long time. But I am seeing it happen a lot right now.
Of course enforcement in the scientific ethics sphere is notoriously imperfect. But so is enforcement in the legal sphere, and enforcement in the legal sphere usually involves the deepest pockets with the most lawyers winning. So I am going to make a controversial statement. I would like to really strongly encourage people to, if you are looking for an open license either:
- Just share your scientific work by CC-BY and be done with it. Be open and don’t try to put any restrictions on it (PLOS does this)
- Share your work under a CC0 or MIT license with no legal restrictions. Then spell out any ethical restrictions you consider to apply (and clearly state them as ethical requests – don’t phrase them like legal requirements because you can’t add legal requirements to a CC license) . People may agree or disagree with your ethical restrictions. And if they disagree with them and ignore them, you cannot sue them. Of course you probably couldn’t’ sue them anyway given the resources available to most academics. But at least your usage is more legally and logically consistent and transparent about what is legal and what is ethical.
- Write your own license.
What do you think (keeping in mind the point here is not to debate open vs restricted license but rather the path to using licenses to do whatever you want in a sensible fashion)? Should there be restrictions beyond attribution (i.e. -SA, -NC, -ND) placed in science? Should there be ethical restrictions (like co-authorship) placed on licenses that are legally open (e.g. CC-BY)? (I’m not interested in debating whether there should ever be ethical restrictions on data – reasonable people disagree). If somebody puts work in a third-party repository that has a mandatory CC0 license and then puts ethical requests on usage of the data are those requests ethically binding? Anybody have a different take on the legality of copyrights and databases from what I gave?