As we move towards experimenting with a Solr-based opac I'm hoping to
persuade everyone involved that MODS is sufficient to drive the search
interface. Let MARC abide in the ILS, and become a mere spirit of malice
that gnaws itself in the shadows, but cannot again grow or take shape.
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Sent: Wednesday, November 29, 2006 8:14 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] code4lib lucene pre-conference
Clay Redding wrote:
> Hi Andrew (or anyone else that cares to answer),
> I've missed out on hearing about incompatabilites between MARCXML and
> NXDBs. Can you explain? Is this just eXist and Sleepycat, or are
> there others? I seem to recall putting a few records in X-Hive with
> no problems, but I didn't put it through any paces.
Yes, I have only done my testing with eXist and Sleepycat, but I also
have an implementation of MarkLogic that I would like to test out. I
imagine though that all NXDBs will have the same problem. This is the
heart of my proposed talk. It has to do with the layout of marcxml.
Adding a few records to any NXDB will work like a charm, do your testing
with 250,000+ records and then you will begin to see the true spirit of
> Also, if there was a cure to the problems with MARCXML (I'm sure we
> can all think of some), what would you suggest to help alleviate the
Sure, I know of a cure! I have come up with a modified marcxml schema,
but as I am investigating SOLR further, I think the solr schema is also
The problem with MARXML is the fact that all of the elements have the
same name and then use the attributes to differentiate them, (excuse my
while I barf) this makes indexing at the XML level very difficult,
especially for NXDBs. I got a concurring agreement from main developers
of both packages (exist, berkeley) in this front. My schema just puts
all of the marc fields into it's own element. Instead of <datafield
code="245">, I created a field called <T245> and instead of all of the
subfields in multiple tags, i just put all of the subfields into one
element. No one needs to search (from my perspective) the subtitle
("b") separately from the main ("a") title, so I just made a really
simple xml document that is 1/4 the size. By doing this I was able to
take a 45 minute search of marcxml records and reduce it down to results
in 1 second. The main boost was not the reduction in file size, but the
way the indexing works.
Give it a shot, I promise better results!