As we move towards experimenting with a Solr-based opac I'm hoping to persuade everyone involved that MODS is sufficient to drive the search interface. Let MARC abide in the ILS, and become a mere spirit of malice that gnaws itself in the shadows, but cannot again grow or take shape. Peter -----Original Message----- From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Andrew Nagy Sent: Wednesday, November 29, 2006 8:14 AM To: [log in to unmask] Subject: Re: [CODE4LIB] code4lib lucene pre-conference Clay Redding wrote: > Hi Andrew (or anyone else that cares to answer), > > I've missed out on hearing about incompatabilites between MARCXML and > NXDBs. Can you explain? Is this just eXist and Sleepycat, or are > there others? I seem to recall putting a few records in X-Hive with > no problems, but I didn't put it through any paces. Yes, I have only done my testing with eXist and Sleepycat, but I also have an implementation of MarkLogic that I would like to test out. I imagine though that all NXDBs will have the same problem. This is the heart of my proposed talk. It has to do with the layout of marcxml. Adding a few records to any NXDB will work like a charm, do your testing with 250,000+ records and then you will begin to see the true spirit of your NXDB. > Also, if there was a cure to the problems with MARCXML (I'm sure we > can all think of some), what would you suggest to help alleviate the > problems? Sure, I know of a cure! I have come up with a modified marcxml schema, but as I am investigating SOLR further, I think the solr schema is also a cure. The problem with MARXML is the fact that all of the elements have the same name and then use the attributes to differentiate them, (excuse my while I barf) this makes indexing at the XML level very difficult, especially for NXDBs. I got a concurring agreement from main developers of both packages (exist, berkeley) in this front. My schema just puts all of the marc fields into it's own element. Instead of <datafield code="245">, I created a field called <T245> and instead of all of the subfields in multiple tags, i just put all of the subfields into one element. No one needs to search (from my perspective) the subtitle ("b") separately from the main ("a") title, so I just made a really simple xml document that is 1/4 the size. By doing this I was able to take a 45 minute search of marcxml records and reduce it down to results in 1 second. The main boost was not the reduction in file size, but the way the indexing works. Give it a shot, I promise better results! Andrew