LISTSERV 16.5 - CODE4LIB Archives

I'd like to share an alternative approach that we're pursuing here at UVa. It doesn't speak quite directly to operations on finding aids by themselves, with no attention to representing on-line the collection so described, but more to those situations where you make an attempt at a full digital surrogate for a collection, using repository machinery. I hope, though, that it might be useful to hear about. We started from a few principles as follows. (All of them have exceptions, of course. {grin})

1) EAD is a wonderful markup language, but not always an optimal metadata standard.

2) XML is for serializing, not for storage.

3) Solr is a fantastic indexing tool, but it's neither a datastore nor a database.

4) Collections do not have an absolutely correct structure. Archivists and scholars disagree sometimes.

5) The best ways to describe an individual entity are not necessarily the best ways to describe the relationships between entities.

We assemble digital surrogates for archival collections as assemblages of Fedora objects linked together by RDF. When we start with a finding aid, we disassemble the EAD to develop a graph of documents, containers, series, etc. in Fedora, with RDF predicates along the lines of "isConstituentOf", "hasCollectionMember", etc. When we haven't got a finding aid, we build up the graph from annotations on the physical objects (boxes, folders, etc.) as they are processed for scanning. Obviously, we get a much simpler graph that way, because no claims have been made by archivists about the structure of the collection. Descriptive and other metadata is stored with each object in MODS and other good -metadata- formats. A document object has metadata that pertain only to the document (along with any data that permits us to represent the document on-line, e.g. a scanned image or TEI text ), a folder object has metadata for that folder, etc. Since we want to offer EAD for a collection (or any piece thereof), we supply a Fedora behavior (dissemination) against any object, which behavior assembles a collection structure as "seen" from that object (by following the RDF graph), then recursively assembles the appropriate metadata and transforms it to produce EAD.

We like this approach because it offers a great deal of extensibility (we could imagine using more sophisticated RDF to account for different opinions about a collection, or offering a METS or other structured view as well) and it keeps the repository contents "idiomatic". We haven't yet figured out entirely how we bring this kind of content to Blacklight, but we'll be aided by the fact that we have appropriately-attached metadata for anything that should appear as a record in our indexes.

We're bringing the first part of this scheme (the assembly of object graphs) to production in the next fortnight or so. We've got the code ready and tested and are now enjoying the really fun stuff-- moving servers around and tinkering with clustering and the like. The second part (producing EAD "live") is waiting to go to production on some work from our cataloging dep't, who have assigned some staff to polish up the mappings involved. We have very simple mappings in place now, but not ones good enough to publish publicly. They're working away, and we hope to see something in production later this fall. As for how we provide discoverability, we'll start simply by indexing all these objects into our local Blacklight instance. There's no need to consider how to index highly-structured XML because we're not storing it. We can move on to providing special views for records with awareness of the relationships that Fedora has recorded on those objects and tools for discovering, visualizing, and following them. Unfortunately, our one Blacklight developer has plenty on her plate already, so I don't know how quickly we'll be able to look at that. In the meanwhile, we can simply style out the dynamically-constructed EAD as part of a Blacklight view for a given record, which isn't particularly exciting, but is useful.

---
A. Soroka
Digital Research and Scholarship R & D
the University of Virginia Library