I saw a similar speedup when I switched from an OO approach to a more
functional style.

Using MARC::Record, it was taking a lot longer to run some data than I
wanted. I rewrote my script, with ad-hoc functional code. And though I
can't give a real rate increase, because I never bothered to wait for
the OO version to finish, I can say that it went from hours to minutes.
I didn't compare to the filter capacity of MARC::File::USMARC, though.
Maybe that would have been fast enough for my needs. Ultimately, though,
I was just dumping these fields into a file, and didn't need any objects
for that.

The speed increase I saw was made possible by the directory. I wouldn't
have even been able to try that with the XML version of the data.


Devon Smith
Consulting Software Engineer
OCLC Research

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Nate Vack
Sent: Friday, November 19, 2010 12:34 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] MARCXML - What is it for?

On Mon, Oct 25, 2010 at 2:22 PM, Eric Hellman <[log in to unmask]> wrote:
> I think you'd have a very hard time demonstrating any speed advantage

Not to bring up this old topic again, but I'm just finishing up a
conversion from "parse this text structure" to "blit this binary data
structure into memory." Both written in python.

The text parsing is indeed fast -- tens of milliseconds to parse 100k
or so of data on my laptop.

The binary code, though, is literally 1,000 times faster -- tens of
*microseconds* to read the same data. (And in this application, yeah,
it'll matter.)

Blitting is much, much, much faster than lexing and parsing, or even
running a regexp over the data.