Clay Redding wrote:

> Hi Andrew (or anyone else that cares to answer),
> I've missed out on hearing about incompatabilites between MARCXML and
> NXDBs.   Can you explain?  Is this just eXist and Sleepycat, or are
> there others?  I seem to recall putting a few records in X-Hive with no
> problems, but I didn't put it through any paces.

Yes, I have only done my testing with eXist and Sleepycat, but I also
have an implementation of MarkLogic that I would like to test out.  I
imagine though that all NXDBs will have the same problem.  This is the
heart of my proposed talk.  It has to do with the layout of marcxml.
Adding a few records to any NXDB will work like a charm, do your testing
with 250,000+ records and then you will begin to see the true spirit of
your NXDB.

> Also, if there was a cure to the problems with MARCXML (I'm sure we can
> all think of some), what would you suggest to help alleviate the
> problems?

Sure, I know of a cure!  I have come up with a modified marcxml schema,
but as I am investigating SOLR further, I think the solr schema is also
a cure.

The problem with MARXML is the fact that all of the elements have the
same name and then use the attributes to differentiate them, (excuse my
while I barf) this makes indexing at the XML level very difficult,
especially for NXDBs.  I got a concurring agreement from main developers
of both packages (exist, berkeley) in this front.  My schema just puts
all of the marc fields into it's own element.  Instead of <datafield
code="245">, I created a field called <T245> and instead of all of the
subfields in multiple tags, i just put all of the subfields into one
element.  No one needs to search (from my perspective) the subtitle
("b") separately from the main ("a") title, so I just made a really
simple xml document that is 1/4 the size.  By doing this I was able to
take a 45 minute search of marcxml records and reduce it down to results
in 1 second.  The main boost was not the reduction in file size, but the
way the indexing works.

Give it a shot, I promise better results!