This is a good question! Thanks for pointing out to Scholars Portal, Kari. I think that the hospitable concept of the SIP/AIP/DIP sort of precludes a perfect or ideal version in real life – the decision always in relation to the content and the preserving organization’s context, resources and needs. I agree with Kari that for many contexts, SIPs/AIPs are best aggregated on some functional basis. For us, the consideration is definitely related to how we provide access to DIPs and want to manage our metadata.
We’re still very much in the pilot phase regarding using Archivematica for processing larger numbers of files. There’s definitely lots of interest in this out there - consulting the Archivematica forum<https://groups.google.com/forum/?fromgroups#!forum/archivematica> is a good place to start. I’d also point to institutions such as Simon Fraser University and the University of British Columbia (and there are definitely others out there too I’m forgetting!) who have developed great automated workflows for the ingest of theses and other materials through Archivematica.
For context: Scholars Portal<http://scholarsportal.info/> hosts a number of content platforms consisting of shared collections of published research materials and data (geospatial, micro-data etc.) that we load from multiple sources to serve to the members of the Ontario Council of University Libraries and the general public. Our preservation services are most fully developed for our journals platform<https://journals.scholarsportal.info/> and the books project is in progress. All of our documentation for the preservation of journals is online here: https://spotdocs.scholarsportal.info/display/OAIS/Home and you might find some of the definitions of SIP/AIP/DIP and details on fixity checking relevant. An important thing to note for us on this scale is that the DIP is generated on demand from the stored AIP. This is also why there’s a 1 article – 1 SIP/AIP model. For our books platform, we’re looking at a different workflow where the AIPs and DIPs are created separately, so the Archivematica workflow is concentrated on creating AIPs only, but whose metadata links back to the access platform. It’s also notable that in both of these cases, the SIP/AIP is not just the article or book – it can also include supplementary files and materials such as images – so its not as atomistic as it initially appears.
Hope this helps!
Grant Hurley | MA, MAS, MLIS
Digital Preservation Librarian
Scholars Portal, Ontario Council of University Libraries
416-978-5648 | [log in to unmask]<mailto:[log in to unmask]> | www.scholarsportal.info<http://www.scholarsportal.info> | www.granthurley.ca<http://www.granthurley.ca>
On 2017-06-02, 10:25 AM, "Code for Libraries on behalf of Andrew Weidner" <[log in to unmask]<mailto:[log in to unmask]> on behalf of [log in to unmask]<mailto:[log in to unmask]>> wrote:
Thanks for the advice, Kari. I'll take a look at those resources.
It's the management burden of having thousands upon thousands of AIPs
that will become the bottleneck / digital management problem in the future.
This is the pushback that I often hear from digital preservationists when
talking about the single object AIP approach, and I'd like to learn more
about these concerns. Any clarification of the long term management burden
posed by single object AIPs would be most helpful.
On Fri, Jun 2, 2017 at 9:17 AM, Kari R Smith <[log in to unmask]<mailto:[log in to unmask]>> wrote:
I suggest that you post this question to digipres list (ALA list) where
many digital archivists and digital preservation folks will see you message
and can respond from that perspective. Also, don't forget the concept of
the AIC (archival information collection) which is an aggregate of AIPs.
I would recommend connecting with Scholar's Portal in Canada. They have
an Archivematica workflow that is very item level (article = 1 SIP) and
have experience and many lessons learned about the issues regarding
performance, management, and scale of having one item in a SIP/AIP. Grant
and Kate from SP recently did a presentation on their approach:
Also keep in mind that the purpose of the AIP is for long-term
preservation that accumulates both metadata and changes to file formats
over time. The DIP (Dissemination information package) is the xIP for
which you may want to have a 1 xIP = 1 item relationship. The DIP is
created from that AIP (in the ideal workflow) so you can have 100 DIPs
generated from a single AIP (that contains the 100 image files, etc.)
It's the management burden of having thousands upon thousands of AIPs that
will become the bottleneck / digital management problem in the future.
Aggregate solutions for digital files, even and especially for digitized
material are more the norm than individual xIPs.
Kari R. Smith
Digital Archivist and Program Head for Born-digital Archives
Institute Archives and Special Collections
Massachusetts Institute of Technology Libraries, Cambridge, Massachusetts
617.253.5690 smithkr at mit.edu http://libraries.mit.edu/archives/
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Sent: Friday, June 02, 2017 9:57 AM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: [CODE4LIB] SIP/AIP Content Guidelines
Can anyone point me to guidelines or best practices documentation around
creating SIPs for transfer to archival storage? What does an ideal AIP look
like for digitized cultural heritage materials?
I'd like to set up a pipeline that sends single object (e.g. one
photograph, one book) SIPs from our digitization workflow to Archivematica
for automated transfer to archival storage. Here's a brief slide deck
outlining the approach I'm envisioning:
I welcome any thoughts that you all may have on this, especially about
pitfalls to avoid.