On May 9, 2008, at 1:42 PM, Jonathan Rochkind wrote:
> The Blacklight code is not currently using XML or XSLT. It's indexing
> binary MARC files. I don't know it's speed, but I hear it's pretty
> fast.
Right, I'm talking about the java indexer we're working on, which
we're hoping to turn into a plugin contrib module for solr. It
processes binary marc files. We're getting times of about 150
records / second, but that's on an unfortunately throttled server and
we're munging each record significantly (replacing musical instrument
and language codes with their English language equivalents,
calculating composition era, etc).
Casey, you say you're getting indexing times of 1000 records /
second? That's amazing! I really have to take a closer look at
MarcThing. Could pymarc really be that much faster than marc4j? Or
are we comparing apples to oranges since we haven't normalized for
the kinds of mapping we're doing and the hardware it's running on?
Bess
Elizabeth (Bess) Sadler
Research and Development Librarian
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904
[log in to unmask]
(434) 243-2305
|