Advanced Search

Please click here to take a brief survey

The Complexity of Sharing Scientific Databases
Ethan Zuckerman, 16 Jul 08


Creative Commons is a clever use of the copyright system intended to make it easier for people who want to, to share their work with others. Jonathan Coulton has used Creative Commons to enable an army of remixers and videomakers to produce promotional materials for his songs and albums. Authors like Dan Gillmor and Cory Doctorow have used Creative Commons to let people download, translate and make audio versions of their books. And Global Voices uses Creative Commons so that blogs and news sites can use our content without asking us for permission.

What about scientists?

That’s the research interest of my colleague Melanie Dulong de Rosnay. She’s using her time as a Berkman fellow to study alternative copyright systems and their usage and relevance within academic and library communities. Yesterday, Melanie presented research on the licensing of scientific databases and the obstacles such licensing presents to collaboration between scientists around the world.

Under US law, pretty much anything you write down is copyrighted. Scrawl an original note on a napkin and it’s protected until 70 years after your death. Facts, however, are another matter - they can’t be copyrighted. So while trivial but creative scribblings are copyrighted, unless you choose to release them into the public domain, the information painstakingly discovered about the human genome - DNA sequences, for instance - aren’t. But the containers they’re stored in - the databases they’re held in - can be copyrighted.

If I sound confused about this stuff, that’s because I am. And so were the folks at Science Commons, the project that spun off from Creative Commons to focus on open publishing of scientific information. For a couple of years, they offered a wonderfully complex FAQ on applying Creative Commons licenses to databases - the first question read “Can a Creative Commons license be applied to a database?” After a six paragraph answer to that question, the third question read, “So, a Creative Commons license can be applied to a database?”

The approach Science Commons is taking now is a different one - they’re now recommending use of a protocol that specifies how data can be made Open Access - the FAQ on that protocol explains that the complexities of asking scientists to release their data under Creative Commons licenses was so severe that Science Commons has ended up advocating for data to be released public domain, under the auspices of their protocol, instead.

This question of complexity is what Melanie’s research has focused on. She looked at the terms of use for roughly 200 databases necessary for work in the life sciences. Evaluating the terms on all those databases, she discovered that only seven met her stringent definitions of Open Access to data - these databases could be accessed without registration; they could be downloaded for local use; they could be incorporated into other works; they had clear, understandable terms of use. This last factor proved to be the most challenging. She spent hours reading these terms with other experts in the field and discovered that, a great deal of time, the experts disagreed on what was permitted under a specific agreement.

The reason this is important, Melanie explains, is that scientific research proceeds more quickly when researchers can share resources. But with databases encumbered by different, confusing legal protections, it can become a legal nightmare for researchers to do complex work building new tools that combine information from two databases in a novel way, for instance. And databases that are protected by access restrictions can be out of reach to scientists in developing nations who might not have the financial or technical resources to access them.

I was particularly intrigued by a comment from John Wilbanks, who runs the Science Commons project. He points out that a project like the database work Science Commons and Melanie are undertaking is basically one that seeks to make a cultural change, encouraging scientists to share data while retaining citation credit. In some scientific communities - particle physics, for instance - this is standard practice. In others - microbiology - it’s quite uncommon. Wilbanks suggests that this has something to do with the economics of the fields. There are only a few supercolliders, and physicists have to share them, while there are lots of bacteria out there.

I’m glad that researchers like Melanie are digging into these issues. I have a great deal of respect for anyone willing to take on the task of understanding these labyrinthine, illogical and extremely important systems… and a great deal of gratitude that I don’t do research in these areas myself… :-)

Bookmark and Share


It's great to encourage organizations to make their data more freely available -- or available at all!

However, I'm not sure unclear licensing and copyright situations are such a huge problem at the moment: Regardless of how clear things are, for example, people tend to either ignore the license, or invest a few minutes to ask the data provider ("this is what I wanted to do with your data, is that fine, and how do I give credit"). Making sure that your interpretation of a license matches that of the data provider may anyway be more important than the "correct" interpretation of the license by both parties (if minimizing legal conflicts is a goal).

In the long term this approach may not scale (need to automate), but note that having all data freely available under the same terms does not remove the onerous and complicated requirement (that just about any scientific application will have) to track the source of any given piece of data.

btw are there plans to publish the results of the review of the licenses, or is this proprietary information? :-)

Posted by: Eric Jain on 17 Jul 08



MESSAGE (optional):

Search Worldchanging

Worldchanging Newsletter Get good news for a change —
Click here to sign up!


Website Design by Eben Design | Logo Design by Egg Hosting | Hosted by Amazon AWS | Problems with the site? Send email to tech /at/
Architecture for Humanity - all rights reserved except where otherwise indicated.

Find_us_on_facebook_badge.gif twitter-logo.jpg