On Jul 23, 2014, at 5:29 PM, Kyle Banerjee wrote:
> We've been facing increasing requests to help researchers publish datasets.
> There are many dimensions to this problem, but one of them is applying
> appropriate metadata and mounting them so they can be explored with a
> regular web browser or downloaded by expert users using specialized tools.
> Datasets often are large. One that we used for a pilot project contained
> well over 10,000 objects with a total size of about 1 TB. We've been asked
> to help with much larger and more complex datasets.
> The pilot was successful but our current process is neither scalable nor
> sustainable. We have some ideas on how to proceed, but we're mostly making
> things up. Are there methods/tools/etc you've found helpful? Also, where
> should we look for ideas? Thanks,
The tools I use are too customized for our field to be of much use to anyone else, so can't help on that part of the question.
I'd really recommend trying to reach out to someone working in data informatics in the field that the data is from, as they would have recommendations on specific metadata that should be captured.
For the general 'data publication' community, it's coalescing, but still a bit all over the place. Here are some of the ones that I know about:
JISC has a 'Data Publication' mailing list:
ASIS&T runs a 'Research Data Access & Preservation' conference and mailing list:
... and they put most of the presentations up on slideshare:
The Research Data Alliance has two working groups on the topic, Publishing Services and Publishing Data Workflows:
I'm also one of the moderators of the Open Data site on Stack Exchange, which has some questions that might be relevant:
Let's suppose I have potentially interesting data. How to distribute?
Benefits of using CC0 over CC-BY for data
... or just ask a new question.
I'd also recommend that when you catalog your data, that you also consider adding DataCite metadata, so that we can try to make it easier for others to cite your data. (specific implementation recommendations for data citation are still evolving, but general principles have been released; if you have questions, feel free to ask me, as I think we need to add some clarification to what we mean on some of the items).
As I see it, you're dealing with data that's in the problem range -- if it were larger, the department collecting the data would have a system in place already; if it were smaller, it's easier to manage as a single item for deposit.