LISTSERV 16.5 - CODE4LIB Archives

> 3) Did they purchase the XML Server product [3] when it was available?
>

XML server is not required to get bib records in XML. This can simply be
enabled in WWWOPTIONS. XML Server is another product that contains
significant additional features, but any system can display certain record
types in XML.


> 4) Can they send you a list of bib record numbers? If so, you can get the
> XML records [4].
>

As is the case with many systems, III bib numbers are sequentially issued so
you can simply guess them which is even easier because when pulling them out
of the webopac, you leave off the check digit.

This means that to learn all the bib numbers in a system:

   1. Look at a new books list or simply find something that was recently
   added to the catalog
   2. Look at source HTML for a bib record. You'll see the bib record number
   right there. This will give you an idea of how long your process needs to
   run
   3. Have a program systematically guess bib numbers (the system seems to
   handle this well, but keep sys and network admins in the loop and design
   your crawler to play nicely -- especially since robots.txt specifically
   disallows the searches you'd be doing).
   4. Periodically, have the program look ahead for new numbers

This process tells you which records are being used and which aren't (you
may as well harvest the data while you're in the record). Incidentally, this
is one of the times that accessing the XML record is useful as that contains
the creation date of the record as well as the internal codes. BTW, using
XRECORD rather than a regular search is better if there are lots of items
because the system doesn't need to combine info from multiple record types
into each display.

I have used this method on millions of records at a time without any
noticeable impact on performance. As sucky as it is, this is sometimes your
best option and it's definitely better than trying to work through the gui.
One advantage it has over using expect and MARC export is that you can
harvest huge databases without having to deal with limits on Create Lists
sizes or the system load that running such large lists imposes. Note that
periodic crawlings are necessary to identify deleted/changed records

kyle