Print

Print


Dear Eric,

As a data librarian, what I see most often is different parts of datasets deposited in different repositories. For example, lets say you are doing a study that combines some kind of genetic sequencing and MRI of the brain. The genetic sequencing would go into a repository like Gene Expression Omnibus (GEO) and the MRIs would go to a brain imaging repository.

If you wanted to deposit your data in both an IR and a generalist repository (like Zenodo), I would not say it is a faux pas, exactly, but it's probably unnecessary. Reputable repositories should have some kind of succession planning in place in the event that they have to close, and a formalized retention plan.

As for the link rot issue, repositories have implemented DOI services and accession numbers to mitigate this issue. If you are writing a data availability statement for a paper or otherwise need to indicate where the dataset is, best practice would be to include the accession number or DOI, rather than the direct link.

Overall, I would not recommend duplicative depositing of data. However, I would still recommend to my patrons making multiple copies of their data in internal​ storage (rather than sharing multiple copies), if possible. Some of the researchers I work with have such large datasets that this would not be feasible.

Hopefully this helps.

Lena

Lena Bohman
Senior Data Management and Research Impact Librarian
Long Island Jewish - Forest Hills Liaison
Donald and Barbara Zucker School of Medicine at Hofstra/Northwell
[cid:4306f569-fa25-4b46-b465-b80ecc4a88c6]
________________________________
From: Code for Libraries <[log in to unmask]> on behalf of Eric Lease Morgan <[log in to unmask]>
Sent: Monday, March 11, 2024 9:02 AM
To: [log in to unmask] <[log in to unmask]>
Subject: [CODE4LIB] data sets in multiple respositories

EXTERNAL MESSAGE

To what degree is it unethical or unprofessional to deposit data sets in multiple respositories?

A long time ago, in a galaxy far far away, the preservation of books and journals was ensured when multiple libraries included books and journals in their collections. This philosopy of preservation was well-articulated with the advent of LOCKSS when they said, "Lot's of copies keep stuff safe." See: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.lockss.org%2F&data=05%7C02%7Clena.g.bohman%40HOFSTRA.EDU%7Ca149b2388a8c4e200f6308dc41cba93e%7Ce32fc43d7c6246d9b49fcd53ba8d9424%7C0%7C0%7C638457590333563352%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qpkM5V9zSLktDcSuZK5M97jQmFirU%2BxKY0g4Am5nEFE%3D&reserved=0<https://www.lockss.org/>

Now-a-days, we relegate the preservation of the scholarly record -- whether that be books, journals, or data sets -- to centralized networked services. Hmmm.

For decades I have been using the Internet to provide access to library collections and services, and one of things this experience has taught me is, links WILL break. Thus, if I deposit my data sets in multiple Internet locations, then the probability of losing access to the data sets decreases. Yet, like the publishing of articles in multiple journals is seen as unethical, would the publishing of data sets in multiple locations be seen in the same light? One problem with multiple deposits would be generation of multiple DOI's, which begs the question, "Which DOI is the authoritative one?"

Put more simply, it is okay for me to deposit my data sets in my university's institutional repository as well as something like Zenodo?

--
Eric Morgan <[log in to unmask]>
Navari Family Center for Digital Scholarship
University of Notre Dame
**** CAUTION: This email originated from outside of Hofstra University. Do not click links or open attachments unless you recognize the sender and know the content is safe. ****