On 11/29/06, Andrew Nagy <[log in to unmask]> wrote:
>
> So ... while we are on this topic. You wouldn't want to index marcxml
> records in lucene, you would use marc21, right? Why deal with the
> overhead of xml if it is not necessary. We have to format our data no
> matter what for to best fit our storage/search system.
This seems like six of one and a half dozen of the other to me. I
don't think Lucene cares either way which you use. In my mind, it is
just a matter of preference... do I want to use XML tools (sax, xom,
rexml) or MARC specific tools (marc4j, pymarc, ruby-marc). All could
be used to build Lucene indices.
On the other hand, what do I want to do with the data after it is
indexed? Do I want to be able to display a whole record (versus just
the little bit I might have stored in the Lucene index)? If so, I'd
rather be working with XML. If I'm just pointing them back to my
OPAC, though, I don't see much difference (other than personal
preference) in the tool choice.
Kevin
|