I saw a similar speedup when I switched from an OO approach to a more functional style. Using MARC::Record, it was taking a lot longer to run some data than I wanted. I rewrote my script, with ad-hoc functional code. And though I can't give a real rate increase, because I never bothered to wait for the OO version to finish, I can say that it went from hours to minutes. I didn't compare to the filter capacity of MARC::File::USMARC, though. Maybe that would have been fast enough for my needs. Ultimately, though, I was just dumping these fields into a file, and didn't need any objects for that. The speed increase I saw was made possible by the directory. I wouldn't have even been able to try that with the XML version of the data. /dev -- Devon Smith Consulting Software Engineer OCLC Research http://www.oclc.org/research/people/smith.htm -----Original Message----- From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Nate Vack Sent: Friday, November 19, 2010 12:34 PM To: [log in to unmask] Subject: Re: [CODE4LIB] MARCXML - What is it for? On Mon, Oct 25, 2010 at 2:22 PM, Eric Hellman <[log in to unmask]> wrote: > I think you'd have a very hard time demonstrating any speed advantage to MARC over MARCXML. Not to bring up this old topic again, but I'm just finishing up a conversion from "parse this text structure" to "blit this binary data structure into memory." Both written in python. The text parsing is indeed fast -- tens of milliseconds to parse 100k or so of data on my laptop. The binary code, though, is literally 1,000 times faster -- tens of *microseconds* to read the same data. (And in this application, yeah, it'll matter.) Blitting is much, much, much faster than lexing and parsing, or even running a regexp over the data. Cheers, -Nate