LISTSERV 16.5 - CODE4LIB Archives

Here's another simple Perl script that works for me:

#!/usr/bin/perl
$file = shift or die "\nUsage: msplit filename [num of records] [new file
name]\n\n";
$s = shift || 1000;
$of = shift || $file;
$/ = chr(29);
$i = 0;
$tot = 0;
open IN, $file or die "Can't find input file '$file'!\n";
while (<IN>) {
	$i++;
	if ($i == $s || $tot == 0) {
		$i = 0;
		$out++;
		$out =~ s/^(\d)$/0$1/;
		$fout = "$of$out";
		unlink $fout;
		open OUT, ">>$fout";
		}
	$tot++;
	print OUT
}
print "\n$tot Marc records written to $out files\n\n";

--Charles Ledvina
infosoup.org



On Tue, 26 Jan 2010 08:16:56 -0600, Tod Olson <[log in to unmask]> wrote:
> The yaz-marcdump utility[1], included in the YAZ toolkit[2], should work
> and I've found it to be blindingly fast.
> 
> -Tod
> 
> [1] http://www.indexdata.com/yaz/doc/yaz-marcdump.html
> [2] http://www.indexdata.com/yaz
> 
> Tod Olson <[log in to unmask]>
> Systems Librarian
> University of Chicago Library
> 
> On Jan 26, 2010, at 2:34 AM, Marc Chantreux wrote:
> 
>> On Mon, Jan 25, 2010 at 11:48:47PM +0530, Saiful Amin wrote:
>>> I also recommend using MARC::Batch. Attached is a simple script I
wrote
>>> for
>>> myself.
>> 
>> I think MARC::Batch would be very slow to split lot of records. As 0x1d
>> is your record separator, a perl oneliner can do the job: 
>> 
>> http://www.tinybox.net/2009/10/12/perl-onliners-vim-and-iso2709/
>> 
>> perl -0x1d -wnE '
>>    # new file every   1000  records 
>>    $. == 1 || ! ($. % 1000 )
>> 	# with record number padded with 5 chars 
>> 	and open F,sprintf ">records_%.5d.mrc",$.;
>>    # actually prints the record 
>>    print F
>> ' bigfile.mrc
>> 
>> if you file is UTF-8 encoded, use -CSD flags
>> 
>> hope it helps
>> regards
>> 
>> 
>> -- 
>> Marc Chantreux
>> BibLibre, expert en logiciels libres pour l'info-doc
>> http://biblibre.com