What Does Open-Access Data Really Mean?

I’ve recently been digging through the internet looking for data on plant traits (things like seed mass, SLA, etc.). I thought this would be relatively simple given the recent push towards open-access data repositories and the arrival of ‘big data ecology‘. Data repositories like Dryad or The Knowledge Network for Biocomplexity aim to make finding raw data easy. Other websites, like TraitNet, LEDA, or TRY aim to compile all of the existing data on a particular subject (like plant traits) into a single location, vet the data, standardize collection practices, and provide the data for use. Other institutions, like NCEAS, appear to have requirements that the data be made publicly available and have their own public, searchable repository.

My question is: After trying unsuccessfully for several days to actually get any of this data, what does open access data really mean? Personally, I pictured a searchable database that would pull up data related to your search terms, provide metadata to determine if the data is useful, and then you click download and *poof*, data on your hard drive. This turned out to be pretty rare.

Some of those databases contain no data, they simply provide links to other databases. Data from Dryad or KNB is spotty: sometimes you can download it, sometimes its listed but not publicly available, sometimes it takes you to another website which asks for a login ID and requires author permission to use. That last bit is common: Data is posted, you are then redirected to a secondary website, requiring a login to access, possibly requiring author permission to use (which requires emailing the author and getting a response), and then, sometimes, requiring uploading data of your own (i.e. TRY), which is hard for people concentrating on meta-analyses.

I guess my complaint is just that, when I saw open-access databases, I imagined searching, browsing, and downloading. It really seems like we’re not quite there yet, there are still dozens of hoops to jump through that make actually getting your hands on the data difficult. One could argue that the author of the data should always have a final say as to whether the data is downloaded. In effect, they do. By uploading the data, the authors are implicitly acknowledging that they have gotten their publications from the data, and while they might still make use of it, its available for anyone else. If the data is still being worked, the authors can always choose not to upload it.

I like the premise of open-access data a lot (its why I make all my data and code available on my website), but I don’t think we’ve quite gotten the spirit of it yet.


One thought on “What Does Open-Access Data Really Mean?

  1. It sounds like you’re describing DataONE. It (in theory) provides exactly what you’re looking for – a distributed network of data providers (including KNB and hopefully soon Dryad) that can be searched from a central location to get the data and metadata easily onto your hard drive. But you’re right, in practice we’re not quite there yet – for some types of data, this is a reality right now, but others are still a few years off. Plant trait data is a particularly unfortunate example where a lot of the data just isn’t being shared right now by those who “own” it. Still, I think we’re headed in the right direction.

    I know a lot of people are interested in a truly open, comprehensive plant trait database. Reach out on Twitter and I’m sure you’ll find many others struggling with the same problem.

