The S3 option seems reasonable, lends itself to the permissions sharing you
need, and is easily automatable.
This solution has the added benefit that developers can simplify transfer
by working directly in AWS if they like. In the case at hand, 1-2 GB is
downloadable. For stuff that's too big to download, it's often easier to
take computing to the data rather than the other way around.
On Wed, Dec 12, 2018 at 11:05 AM Cary Gordon <[log in to unmask]> wrote:
> It would be pennies a month to put this on AWS S3. Infrequent access
> service runs $0.0125 per GB/mo.
> Transfer charges would be inconsequential.
> On Wed, Dec 12, 2018 at 10:53 AM Terry Brady <[log in to unmask]>
> > At the most recent DSpace developer meeting, we discussed the need to
> > meaningful test data resources for system testing.
> > We have created a workflow to publish Docker images for each active
> > of the project. As folks begin to make use of these images, it would be
> > good to offer a standard set of test data to ingest into new instances of
> > the software.
> > A DSpace instance can be populated from AIP files which contain user
> > information, hierarchy information, metadata, and digital assets. Based
> > some existing test resources, I presume that these test data sets could
> > as large as 1-2 GB. Eventually, I could imagine dozens to a couple
> > developers making use of these assets.
> > There are a dozen ways that we could distribute these assets.
> > At a minimum, I am confident that we could find a place to store a static
> > set of assets at a published URL.
> > Ideally, I would like for project contributors to have shared ownership
> > these assets just like there is shared ownership of code files on GitHub.
> > Looking at GitHub help files, it sounds like a GitHub repo is not
> > for distributing files of this size.
> > https://help.github.com/articles/working-with-large-files/ Apparently,
> > GitHub offers some solution in this area, but I have not compared their
> > pricing with other services.
> > Are you aware of any projects that have implemented a good solution for
> > sharing and collaborating on large (1-2 GB) test data files? Are you
> > of any tools or services that might be helpful?
> > Thanks, Terry
> > --
> > Terry Brady
> > Applications Programmer Analyst
> > Georgetown University Library Information Technology
> > https://github.com/terrywbrady/info
> > 425-298-5498 (Seattle, WA)
> Cary Gordon
> The Cherry Hill Company