LISTSERV 16.5 - CODE4LIB Archives

Hi!

Using MARC::Batch I wrote a script to migrate to MARC thousands of records.
To test the results on different samples of the complete set I usually skip
an offset of some (thousands) records. I try to improve performance
profiling with NYTProf. The most time consuming subs seems to be decode
<https://metacpan.org/pod/MARC::File::USMARC#decode(-$string-%5B,-%5C&filter_func-%5D-)>
, and MARC::Field::new. I found *skip()* function in MARC::File::USMARC,
which MARC::Batch (even I thought where just a wrapper) doesn't export/have
it,only *next()*. Skip function reduce the real time by ~25%:
Sample 5000 recs, offset 10000: Before: real 0m17.413s, after: real
0m23.254s.

Beside skip() using directly File::USMARC seems ~15% faster: Sample 10000
recs, offset 0: Before: real 0m36.621s, after: real 0m31.167s.
For a rec counter using MARC::File::USMARC->skip() vs MARC::Batch->next():
136195 recs/sec vs 2609 recs/sec, skip ~5300% better!

Without Batch I get skip but lost some functions like strict_off(),
warnings_off(), which I'll probably need them. Either way I test with
yaz-marcdump, but it's not the same.

Why MARC::Batch doesn't export skip()?
There is a way to have both skip, strict_ and warnings_on/off?
Is normal an hour to translate one million records? With an i7, 12GB ram
and SSD.
Any performance recommendations while walking over thousand of records?
Anyone used Parallel::ForkManager?

Thanks and best regards,
Pablo