I saw a similar speedup when I switched from an OO approach to a more
functional style.
Using MARC::Record, it was taking a lot longer to run some data than I
wanted. I rewrote my script, with ad-hoc functional code. And though I
can't give a real rate increase, because I never bothered to wait for
the OO version to finish, I can say that it went from hours to minutes.
I didn't compare to the filter capacity of MARC::File::USMARC, though.
Maybe that would have been fast enough for my needs. Ultimately, though,
I was just dumping these fields into a file, and didn't need any objects
for that.
The speed increase I saw was made possible by the directory. I wouldn't
have even been able to try that with the XML version of the data.
/dev
--
Devon Smith
Consulting Software Engineer
OCLC Research
http://www.oclc.org/research/people/smith.htm
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Nate Vack
Sent: Friday, November 19, 2010 12:34 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 2:22 PM, Eric Hellman <[log in to unmask]> wrote:
> I think you'd have a very hard time demonstrating any speed advantage
to MARC over MARCXML.
Not to bring up this old topic again, but I'm just finishing up a
conversion from "parse this text structure" to "blit this binary data
structure into memory." Both written in python.
The text parsing is indeed fast -- tens of milliseconds to parse 100k
or so of data on my laptop.
The binary code, though, is literally 1,000 times faster -- tens of
*microseconds* to read the same data. (And in this application, yeah,
it'll matter.)
Blitting is much, much, much faster than lexing and parsing, or even
running a regexp over the data.
Cheers,
-Nate
|