LISTSERV 16.5 - CODE4LIB Archives

Ethan,
The plugin I wrote for Blacklight is just a start and was a proof of
concept/template. Having said that this is basically code I extracted
from another application I have in production. In that case it wasn't
necessary to display every detail in the EAD, so it is really just a
short view.

The plugin does some very basic indexing of the EAD to conform to the
default Blacklight Solr schema. It could certainly be expanded to get
better faceting and fielded search in a customized Blacklight. Lots of
possibilities for expansion. The indexing also takes the simple
approach of one EAD XML document being one Solr document. Other folks
have played around with splitting an EAD doc into different Solr
documents, but I haven't been satisfied with either the display of the
search results or show views, which have seemed too fragmented to me.

The display in the plugin is one page for the whole finding aid. The
display is concise, but that's not the biggest problem with it. The
EAD XML is stored as a Solr field. I've heard conflicting information
about this, but it may be slow to retrieve large fields from Solr.
(Anyone want to put that idea to rest?) The biggest problem with this
implementation, though, is that the XML parsing is done using the
Nokogiri DOM parser. Nokogiri is fast enough, but still loading up the
whole DOM into memory and looping through a long container list can
take a very long time. I've worked around that with partial caching in
my applications.

If you want to see it in action, it is very easy to set up if you
already have Ruby installed. Just one template command to build the
Rails app and then answer yes to all the questions. Remember to start
jetty before trying to index.
http://github.com/jronallo/blacklight_ext_ead_simple

I have been fooling around with creating a new library that uses
Nokogiri's SAX parser. This makes parsing on the fly much faster. I'm
also attempting to deal with more of the content as found in a basic
Archivists' Toolkit EAD XML doc. The problem with the SAX parsing is
that you have to deal with all the craziness of EAD as it is streaming
at you. I have something basically working, if messy, which I hope to
have up on github soon.

Please let me know if you have any other questions about this.

Jason

On Fri, Jul 30, 2010 at 11:17 AM, Ethan Gruber <[log in to unmask]> wrote:
> By "displays it", do you mean there is a view for displaying some metadata
> about the EAD guide in the blacklight search results or that the entire
> guide is rendered out in blacklight somehow?  Hopefully Jason is on the
> list.  I'm curious about this.
>
> Thanks,
> Ethan
>
> On Fri, Jul 30, 2010 at 11:06 AM, Adam Wead <[log in to unmask]> wrote:
>
>> Takes an ead doc, indexes it solr, and displays it via blacklight.  I think
>> Jason's on this list, so he could tell you more about it.  I took it and
>> modified the display a bit.  It's available via git:
>>
>> http://github.com/jronallo/blacklight_ext_ead_simple
>>
>>
>>
>> -----Original Message-----
>> From: Code for Libraries on behalf of Ethan Gruber
>> Sent: Fri 7/30/2010 10:06 AM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] Batch loading in fedora
>>
>> What does the EAD plugin do?  I haven't heard much about it.
>>
>> Ethan
>>
>> On Fri, Jul 30, 2010 at 10:03 AM, Adam Wead <[log in to unmask]> wrote:
>>
>> > Hardy,
>> >
>> > Here's the task:
>> >
>> > http://github.com/awead/rocklight/blob/master/lib/tasks/fedora.rake
>> >
>> > I just threw up the project on git, so there's not much explanation of
>> > anything.  It's very much a work-in-progress.  It uses blacklight, an ead
>> > plugin that Jason Ronallo wrote, and a bunch of active-fedora/hydrangea
>> > code.  The image ingest process is designed to attach an image pid to an
>> > existing pid in fedora that is the archival collection.  I've been only
>> > testing this, so right now it ingests some jpg files and uses image
>> magick
>> > to resize them into a thumbnail and access version.  In "real life" the
>> > preservation stream would be tiff and the thumbnail and access version
>> would
>> > be jpegs.  I also threw in a jhove datastream for fun, but I'm not doing
>> > anything with it at this point other than just storing it.
>> >
>> > The three descriptive medata streams are from the active-fedora model.
>> >  Ideally, we'd use a mods schema for all the descriptive data instead of
>> > these three different ones, but that'll be the next step.
>> >
>> > let me know if you have comments or questions.  Again, it's a
>> > work-in-progress.  I only started coding in Ruby/rails a couple of months
>> > ago, so there might be some quirky things to it.
>> >
>> > ...adam
>> >
>> >
>> > -----Original Message-----
>> > From: Code for Libraries on behalf of Pottinger, Hardy J.
>> > Sent: Thu 7/29/2010 11:26 PM
>> > To: [log in to unmask]
>> > Subject: Re: [CODE4LIB] Batch loading in fedora
>> >
>> > > Following along the Ruby thread, I've got some rake task that will
>> ingest
>> > images.  Let me
>> > > know if you want to take a look at that.
>> >
>> > Well, this may come as no surprise :-) but I for one would love to see
>> that
>> > rake task for image ingest.
>> >
>> > --Hardy
>> >
>> >
>> >
>> > Rock & Roll: (noun) African American slang dating back to the early 20th
>> > Century. In the early 1950s, the term came to be used to describe a new
>> form
>> > of music, steeped in the blues, rhythm & blues, country and gospel.
>> Today,
>> > it refers to a wide variety of popular music -- frequently music with an
>> > edge and attitude, music with a good beat and --- often --- loud
>> guitars.©
>> > 2005 Rock and Roll Hall of Fame and Museum.
>> >
>> > This communication is a confidential and proprietary business
>> > communication. It is intended solely for the use of the designated
>> > recipient(s). If this communication is received in error, please contact
>> the
>> > sender and delete this communication.
>> >
>>
>