Thank you Lydia and Grant. I'm impressed with the thorough documentation
that you all have produced for your preservation services.
I've posted a similar inquiry to the Archivematica forum that explains in a
bit more detail how this 1 object : 1 SIP/AIP preservation model arose out
of some throughput problems we encountered with transfers to archival
storage, especially with AV resources that seem to require 1 object per SIP
because of the temporary storage needs for processing such large files.
We've also noticed some murky indexing issues related to file sizes and
number of files in a SIP that lead us to believe the only consistent rule
for automated SIP creation that accommodates all of our content types is 1
object = 1 SIP/AIP.
We could manually create each SIP/AIP with larger aggregations of content
based on what we think Archivematica can handle. That seems like a lot of
work up front with the potential for introducing human error. In a workflow
that splits AIP and DIP creation into separate streams at the start,
manually aggregating content also hampers our ability to assign persistent
identifiers for preservation packages that we can link to from other
I'm curious how Scholars Portal plans to handle AIP creation in the Books
workflow, where the AIP and the DIP are created separately, and how you're
creating the metadata linkage back to the access platform. That seems to be
similar to what we're working on.
On Fri, Jun 2, 2017 at 12:00 PM, Grant Hurley <[log in to unmask]>
> Hi Randy,
> This is a good question! Thanks for pointing out to Scholars Portal, Kari.
> I think that the hospitable concept of the SIP/AIP/DIP sort of precludes a
> perfect or ideal version in real life – the decision always in relation to
> the content and the preserving organization’s context, resources and needs.
> I agree with Kari that for many contexts, SIPs/AIPs are best aggregated on
> some functional basis. For us, the consideration is definitely related to
> how we provide access to DIPs and want to manage our metadata.
> We’re still very much in the pilot phase regarding using Archivematica for
> processing larger numbers of files. There’s definitely lots of interest in
> this out there - consulting the Archivematica forum<https://groups.google.
> com/forum/?fromgroups#!forum/archivematica> is a good place to start. I’d
> also point to institutions such as Simon Fraser University and the
> University of British Columbia (and there are definitely others out there
> too I’m forgetting!) who have developed great automated workflows for the
> ingest of theses and other materials through Archivematica.
> For context: Scholars Portal<http://scholarsportal.info/> hosts a number
> of content platforms consisting of shared collections of published research
> materials and data (geospatial, micro-data etc.) that we load from multiple
> sources to serve to the members of the Ontario Council of University
> Libraries and the general public. Our preservation services are most fully
> developed for our journals platform<https://journals.scholarsportal.info/>
> and the books project is in progress. All of our documentation for the
> preservation of journals is online here: https://spotdocs.
> scholarsportal.info/display/OAIS/Home and you might find some of the
> definitions of SIP/AIP/DIP and details on fixity checking relevant. An
> important thing to note for us on this scale is that the DIP is generated
> on demand from the stored AIP. This is also why there’s a 1 article – 1
> SIP/AIP model. For our books platform, we’re looking at a different
> workflow where the AIPs and DIPs are created separately, so the
> Archivematica workflow is concentrated on creating AIPs only, but whose
> metadata links back to the access platform. It’s also notable that in both
> of these cases, the SIP/AIP is not just the article or book – it can also
> include supplementary files and materials such as images – so its not as
> atomistic as it initially appears.
> Hope this helps!
> Grant Hurley | MA, MAS, MLIS
> Digital Preservation Librarian
> Scholars Portal, Ontario Council of University Libraries
> 416-978-5648 | [log in to unmask]<mailto:[log in to unmask]>
> | www.scholarsportal.info<http://www.scholarsportal.info> |
> On 2017-06-02, 10:25 AM, "Code for Libraries on behalf of Andrew Weidner" <
> [log in to unmask]<mailto:[log in to unmask]> on behalf of
> [log in to unmask]<mailto:[log in to unmask]>> wrote:
> Thanks for the advice, Kari. I'll take a look at those resources.
> It's the management burden of having thousands upon thousands of AIPs
> that will become the bottleneck / digital management problem in the future.
> This is the pushback that I often hear from digital preservationists when
> talking about the single object AIP approach, and I'd like to learn more
> about these concerns. Any clarification of the long term management burden
> posed by single object AIPs would be most helpful.
> On Fri, Jun 2, 2017 at 9:17 AM, Kari R Smith <[log in to unmask]<mailto:
> [log in to unmask]>> wrote:
> Hi Randy,
> I suggest that you post this question to digipres list (ALA list) where
> many digital archivists and digital preservation folks will see you message
> and can respond from that perspective. Also, don't forget the concept of
> the AIC (archival information collection) which is an aggregate of AIPs.
> I would recommend connecting with Scholar's Portal in Canada. They have
> an Archivematica workflow that is very item level (article = 1 SIP) and
> have experience and many lessons learned about the issues regarding
> performance, management, and scale of having one item in a SIP/AIP. Grant
> and Kate from SP recently did a presentation on their approach:
> Also keep in mind that the purpose of the AIP is for long-term
> preservation that accumulates both metadata and changes to file formats
> over time. The DIP (Dissemination information package) is the xIP for
> which you may want to have a 1 xIP = 1 item relationship. The DIP is
> created from that AIP (in the ideal workflow) so you can have 100 DIPs
> generated from a single AIP (that contains the 100 image files, etc.)
> It's the management burden of having thousands upon thousands of AIPs that
> will become the bottleneck / digital management problem in the future.
> Aggregate solutions for digital files, even and especially for digitized
> material are more the norm than individual xIPs.
> Good luck,
> Kari R. Smith
> Digital Archivist and Program Head for Born-digital Archives
> Institute Archives and Special Collections
> Massachusetts Institute of Technology Libraries, Cambridge, Massachusetts
> 617.253.5690 smithkr at mit.edu http://libraries.mit.edu/archives/
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Andrew Weidner
> Sent: Friday, June 02, 2017 9:57 AM
> To: [log in to unmask]<mailto:[log in to unmask]>
> Subject: [CODE4LIB] SIP/AIP Content Guidelines
> Hi all,
> Can anyone point me to guidelines or best practices documentation around
> creating SIPs for transfer to archival storage? What does an ideal AIP look
> like for digitized cultural heritage materials?
> I'd like to set up a pipeline that sends single object (e.g. one
> photograph, one book) SIPs from our digitization workflow to Archivematica
> for automated transfer to archival storage. Here's a brief slide deck
> outlining the approach I'm envisioning:
> I welcome any thoughts that you all may have on this, especially about
> pitfalls to avoid.