Actually, the Python code is all home-grown. Its main virtue is speed,
since we regularly pass 50+ million records through it. It avoids
user-defined classes for just about everything but the main record class.
I've generalized it a bit lately to handle OAI-harvested DC records, etc.
--Th
-----Original Message-----
From: Ed Summers [mailto:[log in to unmask]]
Sent: Tuesday, May 25, 2004 2:26 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Linfeed Trick
On Tue, May 25, 2004 at 02:08:17PM -0400, Hickey,Thom wrote:
> Maybe others are doing this (or is everyone using XML?), but it's new to
us
> here. Maybe this would even work with MARC-XML if you restricted
linefeeds
> to the end of record.
This is how MARC is read by MARC::File::USMARC in the MARC::Record CPAN
module :)
> On my workstation, grep can plow through 50 million Unicode MARC-21
records
> in less than 15 minutes. The best time our C software can do is more than
> half an hour and our Python code could take several hours.
Cool! I've been working off and on on a Python port for MARC::Record,
and wasnt' able to find an equivalent to $/ in Python. But I'm a Python
newbie, so perhaps I overlooked something?
//Ed
|