At the most recent DSpace developer meeting, we discussed the need to share
meaningful test data resources for system testing.
We have created a workflow to publish Docker images for each active branch
of the project. As folks begin to make use of these images, it would be
good to offer a standard set of test data to ingest into new instances of
the software.
A DSpace instance can be populated from AIP files which contain user
information, hierarchy information, metadata, and digital assets. Based on
some existing test resources, I presume that these test data sets could be
as large as 1-2 GB. Eventually, I could imagine dozens to a couple hundred
developers making use of these assets.
There are a dozen ways that we could distribute these assets.
At a minimum, I am confident that we could find a place to store a static
set of assets at a published URL.
Ideally, I would like for project contributors to have shared ownership of
these assets just like there is shared ownership of code files on GitHub.
Looking at GitHub help files, it sounds like a GitHub repo is not suitable
for distributing files of this size.
https://help.github.com/articles/working-with-large-files/ Apparently,
GitHub offers some solution in this area, but I have not compared their
pricing with other services.
Are you aware of any projects that have implemented a good solution for
sharing and collaborating on large (1-2 GB) test data files? Are you aware
of any tools or services that might be helpful?
Thanks, Terry
--
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://github.com/terrywbrady/info
425-298-5498 (Seattle, WA)
|