On Jun 17, 2008, at 12:06 PM, siznax wrote:
> the Zebralist appears to be relatively inactive.
> anyone here have experience indexing MARC binaries
> with zebra?
I will try to outline here how to index (and search) MARC records
using Zebra, but tweaking the indexing process is a bit trickier than
I know how to do.
1. Install yaz, zebra, and all of their friends. I have found that
the "standard" make process works pretty well, but allow yaz and
zebra to specify where it puts various configuration files. The extra
specification is not worth the effort.
2. Save your MARC records someplace on your file system. By
"binary" MARC records, I suppose you mean "real" MARC records -- MARC
records in communications format -- MARC records as the types of
records fed to traditional integrated library systems. This is
opposed to some flavor of XML or "tagged format" often used for display.
3. Create a zebra.cfg file, and have it look something like this:
# global paths
# turn ranking on
# define a database of marc records called opac
4. Index your MARC records with the following command. You should
see lot's of great stuff sent to STDOUT.
zebraidx -g opac update <path to MARC records>
You have now created your index. Once you get this far with indexing,
you will want to tweak various .abs files (I think) to enhance the
indexing process. This particular thing is not my forte. It seems
like black magic to most of us. This is not a Zebra-specific problem;
this is a problem with Z39.50.
Next, you need to implement the client/server end of things:
5. Start your server. This will be a Z39.50 server -- a "kewl"
library-centric protocol that existed before the Internet got hot:
zebrasrv localhost:9999 &
6. Use yaz-client to search your index:
Z> open localhost:9999/opac
Z> find origami
Z> show 1
Using the yaz-client almost requires a knowledge of Z39.50. Attached
should be a Perl script that allows you to search your server in a
bit more user-friendly way. To use it you will need to install a few
Perl modules and then edit the constant called DATABASE.
Even though Z39.50 is/was "kewl" it is still pretty icky. SRU is
better -- definitely a step in the right direction, and Zebra
supports SRU out of the box. 
7. Create an an SRU configuration file looking something like this:
8. Acquire a "better" pqf.properties file. PQF is about querying
Z39.50 databases. It is ugly. It was designed in a non-Internet
world. Instead of knowing that 1=4 means search the title field, you
want to simply search the title. Attached is a "better"
pqf.properties file, and it is "better" because it maps things like
1=4 to Dublin Core equivalents. Save it in a directory called etc in
the same directory as your zebra.cfg file. (Notice how the zebra.cfg
file, above, denotes etc as being in zebra's path.)
9. Kill your presently running Z39.50 server.
10. Start up a SRU server:
zebrasrv -f sru.cfg localhost:9999 &
11. Use your HTTP client to search the SRU server. Queries will
look like this (with carriage returns added for readability):
The result should be a stream of XML ready for XSLT processing.
All of the above is almost exactly what I did to create an index of
MARC records harvested from the Library of Congress and the
University of Michigan's OAI data repository (MBooks).  Take a
look at the HTML source. Notice how the client in this regard is only
one HTML file containing a form, one CSS file for style, and one XSL
file for XML to HTML transformation.
 SRU - http://www.loc.gov/standards/sru/
 Example SRU interface - http://infomotions.com/ii/
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
Hesburgh Libraries, University of Notre Dame