Print

Print


I'd suggest contacting the vendor. Whoever handles your the electronic subscriptions will have some relationship with a customer rep. Explain what you're trying to do and you might get the customer rep to advocate for you.

-Tod

On Nov 14, 2013, at 9:30 AM, Eric Lease Morgan <[log in to unmask]> wrote:

> Thank you for the replies, and after a bit of investigation I learned that I don’t need to do authentication because the vendor does IP authentication. Nice! On the other hand, I was still not able to resolve my original problem. 
> 
> I needed/wanted to download ten’s of thousands, if not hundred’s of thousands of citations for text mining analysis. The Web interface to the database/index limits output to 4,000 items and selecting the set of these items is beyond tedious — it is cruel and unusual punishment. I then got the idea of using EndNote’s z39.50 client, and after a bit of back & forth I got it working, but the downloading process was too slow. I then got the bright idea of writing my own z39.50 client (below). Unfortunately, I learned that the 4,000 record limit is more than that. A person can only download the first 4,000 records in a found set. Requests for record 4001, 4002, etc. fail. This is true in my locally written client as well as in EndNote.
> 
> Alas, it looks as if I am unable to download the data I need/require, unless somebody at the vendor give me a data dump. On the other hand, since my locally written client is so short and simple, I think I can create a Web-based interface to query many different z39.50 targets and provide on-the-fly text mining analysis against the results.
> 
> In short, I learned a great many things.
> 
> —
> Eric Lease Morgan
> University of Notre Dame
> 
> 
> #!/usr/bin/perl
> 
> # nytimes-search.pl - rudimentary z39.50 client to query the NY Times
> 
> # Eric Lease Morgan <[log in to unmask]>
> # November 13, 2013 - first cut; "Happy Birthday, Steve!"
> 
> # usage: ./nytimes-search.pl > nytimes.marc
> 
> 
> # configure
> use constant DB     => 'hnpnewyorktimes';
> use constant HOST   => 'fedsearch.proquest.com';
> use constant PORT   => 210;
> use constant QUERY  => '@attr 1=1016 "trade or tariff"';
> use constant SYNTAX => 'usmarc';
> 
> # require
> use strict;
> use ZOOM;
> 
> # do the work
> eval {
> 
> 	# connect; configure; search
> 	my $conn = new ZOOM::Connection( HOST, PORT, databaseName => DB );
> 	$conn->option( preferredRecordSyntax => SYNTAX );
> 	my $rs = $conn->search_pqf( QUERY );
> 
> 	# requests > 4000 return errors
> 	# print $rs->record( 4001 )->raw;
> 			
> 	# retrieve; will break at record 4,000 because of vendor limitations
> 	for my $i ( 0 .. $rs->size ) {
> 	
> 		print STDERR "\tRetrieving record #$i\r";
> 		print $rs->record( $i )->raw;
> 		
> 	}
> 		
> };
> 
> # report errors
> if ( $@ ) { print STDERR "Error ", $@->code, ": ", $@->message, "\n" }
> 
> # done
> exit;