Hey Eric,
Sounds like a good plan, but I wanted to throw in my two cents on your
workflow. <unitid> is intended to be an optional element and describe an
actual unique identifier that the object or collection has been given by the
hosting institution. For example, accession number. <unitid> isn't
absolutely intended to be a machine-readable value (for example, xml:id),
though it could be. I think what you want to do is populate the id
attribute for each component with a unique xml:id. This way you can make
all your components have a machine-readable identifier while preserving any
actual unique identifiers that describe the component.
I think you've made the right choice in separating delivery from the search
interface. People can find your EAD guides in VuFind, but Archon's
interface is used for displaying the guides and, presumably, provides a very
EAD-specific interface for searching and sorting the EAD collection (I have
to admit I don't know much about Archon's public interface).
Ethan
On Mon, Aug 9, 2010 at 7:17 PM, Eric Lease Morgan <[log in to unmask]> wrote:
> To index and display did-level EAD elements,
> Or to index finding aids as a whole.
> That is the question.
>
> Seriously, this discussion surrounding the indexing and display of EAD
> files is extraordinarily timely, but the loose consensus on how to do it
> does not jive with my experience. In short, I have been told by my archivist
> friends that I need to index and display each and every did-level element in
> my EAD files, and then provide a link to the finding aid as a whole. Let me
> explain.
>
> Here at Notre Dame we are leading an effort we colloquially call the
> "Catholic Portal". [1] We use VUFind as our "discovery system" and thus Solr
> as the underlying indexer. Much of the metadata I index is MARC-based, but
> increasingly it is and will be EAD-based. Using VUFind to index MARC records
> is well-understood. Until only very recently has it been truly feasible to
> index content other than MARC, such as EAD. A few months ago time was spent
> parsing EAD and stuffing it into the underlying Solr index. We took metadata
> from the EAD header and mapped it to Solr fields. We then free text indexed
> the balance. Thus searches for anything found in the EAD was returned
> complete with EAD title, author, etc. Links to the original EAD were then
> provided. The process functioned, but it was not deemed good enough by the
> archivists in the crowd.
>
> As you know, EAD files are not structured like most MARC records. An EAD
> file represents an entire collection. Within that collection there may be
> sub-collections upon sub-collections. While the EAD's header and archdesc
> element may describe the collection as a whole, the sub-level and nested did
> elements are the real meat of the matter. Free text searches over the entire
> EAD that only return the over-arching metadata do not put search results in
> context, even if one does provide links out to the full finding aid. Instead
> (ideally), each and every did needs to be indexed and displayed in search
> results. Moreover (ideally), these search results need to be displayed in
> their hierarchal relationship with the balance of the EAD file.
>
> We began work to implement this (ideal) solution [2], but the developer
> went on to a more permanent job here on campus.
>
> Here is what I plan to do:
>
> 1. acquire EAD files from "Catholic Portal" participants
> 2. cache them locally
> 3. pre-process each EAD making sure they have eadid elements
> 4. pre-procees each EAD making sure each did element contains
> a unitid element, and if they don't then assign them one
> 5. store and index each EAD file in Archon [3]
> 6. parse each did from each EAD file and integrate the result
> into the VUFind/Solr index along with the MARC metadata
> 7. use VUFind as the primary interface to the "Catholic Portal"
> 8. use Archon as the means for displaying and navigating EAD files
> 9. go to Step #1
>
> Actually, my plan is not very much different from everybody else's plan.
> I'm using Solr as my indexer but the VUFind/Solr schema instead of
> Blacklight's. For simplicity's sake, I'm using Archon for storing/displaying
> my EAD instead of Fedora. (You say tomāto. I say tomäto. [4]) The most
> significant difference is the level at which I am expected to index and
> display the EAD files. I see a whole lot of XPath queries in my future.
>
>
> [1] Catholic Portal - http://www.catholicresearch.net
> [2] indexing EAD -
> http://serials.infomotions.com/code4lib/archive/2010/201007/1957.html
> [3] Archon - http://www.archon.org/
> [4] (Don't ya just gotta love Unicode.)
>
> --
> Eric Lease Morgan
>
|