Here's another simple Perl script that works for me:
#!/usr/bin/perl
$file = shift or die "\nUsage: msplit filename [num of records] [new file
name]\n\n";
$s = shift || 1000;
$of = shift || $file;
$/ = chr(29);
$i = 0;
$tot = 0;
open IN, $file or die "Can't find input file '$file'!\n";
while (<IN>) {
$i++;
if ($i == $s || $tot == 0) {
$i = 0;
$out++;
$out =~ s/^(\d)$/0$1/;
$fout = "$of$out";
unlink $fout;
open OUT, ">>$fout";
}
$tot++;
print OUT
}
print "\n$tot Marc records written to $out files\n\n";
--Charles Ledvina
infosoup.org
On Tue, 26 Jan 2010 08:16:56 -0600, Tod Olson <[log in to unmask]> wrote:
> The yaz-marcdump utility[1], included in the YAZ toolkit[2], should work
> and I've found it to be blindingly fast.
>
> -Tod
>
> [1] http://www.indexdata.com/yaz/doc/yaz-marcdump.html
> [2] http://www.indexdata.com/yaz
>
> Tod Olson <[log in to unmask]>
> Systems Librarian
> University of Chicago Library
>
> On Jan 26, 2010, at 2:34 AM, Marc Chantreux wrote:
>
>> On Mon, Jan 25, 2010 at 11:48:47PM +0530, Saiful Amin wrote:
>>> I also recommend using MARC::Batch. Attached is a simple script I
wrote
>>> for
>>> myself.
>>
>> I think MARC::Batch would be very slow to split lot of records. As 0x1d
>> is your record separator, a perl oneliner can do the job:
>>
>> http://www.tinybox.net/2009/10/12/perl-onliners-vim-and-iso2709/
>>
>> perl -0x1d -wnE '
>> # new file every 1000 records
>> $. == 1 || ! ($. % 1000 )
>> # with record number padded with 5 chars
>> and open F,sprintf ">records_%.5d.mrc",$.;
>> # actually prints the record
>> print F
>> ' bigfile.mrc
>>
>> if you file is UTF-8 encoded, use -CSD flags
>>
>> hope it helps
>> regards
>>
>>
>> --
>> Marc Chantreux
>> BibLibre, expert en logiciels libres pour l'info-doc
>> http://biblibre.com
|