Print

Print


Jason,

Thanks for the info.  Nokogiri is alright, but I've found that, as far as
XML processing goes, Saxon is above and beyond the best.  Is it possible
fire off a Java call from Ruby to have Saxon handle it, or not?  Are you
using Nokogiri to call an XSLT process or using Ruby to generate the view?

I've also heard about scalability issues with Solr and large XML documents,
but I've never seen benchmarks.

Ethan

On Mon, Aug 2, 2010 at 8:46 AM, Jason Ronallo <[log in to unmask]> wrote:

> Ethan,
> The plugin I wrote for Blacklight is just a start and was a proof of
> concept/template. Having said that this is basically code I extracted
> from another application I have in production. In that case it wasn't
> necessary to display every detail in the EAD, so it is really just a
> short view.
>
> The plugin does some very basic indexing of the EAD to conform to the
> default Blacklight Solr schema. It could certainly be expanded to get
> better faceting and fielded search in a customized Blacklight. Lots of
> possibilities for expansion. The indexing also takes the simple
> approach of one EAD XML document being one Solr document. Other folks
> have played around with splitting an EAD doc into different Solr
> documents, but I haven't been satisfied with either the display of the
> search results or show views, which have seemed too fragmented to me.
>
> The display in the plugin is one page for the whole finding aid. The
> display is concise, but that's not the biggest problem with it. The
> EAD XML is stored as a Solr field. I've heard conflicting information
> about this, but it may be slow to retrieve large fields from Solr.
> (Anyone want to put that idea to rest?) The biggest problem with this
> implementation, though, is that the XML parsing is done using the
> Nokogiri DOM parser. Nokogiri is fast enough, but still loading up the
> whole DOM into memory and looping through a long container list can
> take a very long time. I've worked around that with partial caching in
> my applications.
>
> If you want to see it in action, it is very easy to set up if you
> already have Ruby installed. Just one template command to build the
> Rails app and then answer yes to all the questions. Remember to start
> jetty before trying to index.
> http://github.com/jronallo/blacklight_ext_ead_simple
>
> I have been fooling around with creating a new library that uses
> Nokogiri's SAX parser. This makes parsing on the fly much faster. I'm
> also attempting to deal with more of the content as found in a basic
> Archivists' Toolkit EAD XML doc. The problem with the SAX parsing is
> that you have to deal with all the craziness of EAD as it is streaming
> at you. I have something basically working, if messy, which I hope to
> have up on github soon.
>
> Please let me know if you have any other questions about this.
>
> Jason
>
> On Fri, Jul 30, 2010 at 11:17 AM, Ethan Gruber <[log in to unmask]> wrote:
> > By "displays it", do you mean there is a view for displaying some
> metadata
> > about the EAD guide in the blacklight search results or that the entire
> > guide is rendered out in blacklight somehow?  Hopefully Jason is on the
> > list.  I'm curious about this.
> >
> > Thanks,
> > Ethan
> >
> > On Fri, Jul 30, 2010 at 11:06 AM, Adam Wead <[log in to unmask]> wrote:
> >
> >> Takes an ead doc, indexes it solr, and displays it via blacklight.  I
> think
> >> Jason's on this list, so he could tell you more about it.  I took it and
> >> modified the display a bit.  It's available via git:
> >>
> >> http://github.com/jronallo/blacklight_ext_ead_simple
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Code for Libraries on behalf of Ethan Gruber
> >> Sent: Fri 7/30/2010 10:06 AM
> >> To: [log in to unmask]
> >> Subject: Re: [CODE4LIB] Batch loading in fedora
> >>
> >> What does the EAD plugin do?  I haven't heard much about it.
> >>
> >> Ethan
> >>
> >> On Fri, Jul 30, 2010 at 10:03 AM, Adam Wead <[log in to unmask]> wrote:
> >>
> >> > Hardy,
> >> >
> >> > Here's the task:
> >> >
> >> > http://github.com/awead/rocklight/blob/master/lib/tasks/fedora.rake
> >> >
> >> > I just threw up the project on git, so there's not much explanation of
> >> > anything.  It's very much a work-in-progress.  It uses blacklight, an
> ead
> >> > plugin that Jason Ronallo wrote, and a bunch of
> active-fedora/hydrangea
> >> > code.  The image ingest process is designed to attach an image pid to
> an
> >> > existing pid in fedora that is the archival collection.  I've been
> only
> >> > testing this, so right now it ingests some jpg files and uses image
> >> magick
> >> > to resize them into a thumbnail and access version.  In "real life"
> the
> >> > preservation stream would be tiff and the thumbnail and access version
> >> would
> >> > be jpegs.  I also threw in a jhove datastream for fun, but I'm not
> doing
> >> > anything with it at this point other than just storing it.
> >> >
> >> > The three descriptive medata streams are from the active-fedora model.
> >> >  Ideally, we'd use a mods schema for all the descriptive data instead
> of
> >> > these three different ones, but that'll be the next step.
> >> >
> >> > let me know if you have comments or questions.  Again, it's a
> >> > work-in-progress.  I only started coding in Ruby/rails a couple of
> months
> >> > ago, so there might be some quirky things to it.
> >> >
> >> > ...adam
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Code for Libraries on behalf of Pottinger, Hardy J.
> >> > Sent: Thu 7/29/2010 11:26 PM
> >> > To: [log in to unmask]
> >> > Subject: Re: [CODE4LIB] Batch loading in fedora
> >> >
> >> > > Following along the Ruby thread, I've got some rake task that will
> >> ingest
> >> > images.  Let me
> >> > > know if you want to take a look at that.
> >> >
> >> > Well, this may come as no surprise :-) but I for one would love to see
> >> that
> >> > rake task for image ingest.
> >> >
> >> > --Hardy
> >> >
> >> >
> >> >
> >> > Rock & Roll: (noun) African American slang dating back to the early
> 20th
> >> > Century. In the early 1950s, the term came to be used to describe a
> new
> >> form
> >> > of music, steeped in the blues, rhythm & blues, country and gospel.
> >> Today,
> >> > it refers to a wide variety of popular music -- frequently music with
> an
> >> > edge and attitude, music with a good beat and --- often --- loud
> >> guitars.©
> >> > 2005 Rock and Roll Hall of Fame and Museum.
> >> >
> >> > This communication is a confidential and proprietary business
> >> > communication. It is intended solely for the use of the designated
> >> > recipient(s). If this communication is received in error, please
> contact
> >> the
> >> > sender and delete this communication.
> >> >
> >>
> >
>