Print

Print


Eric this is awesome.

It would probably be worthwhile to document all this somewhere
(outside this mailing list), maybe the code4lib wiki?

-Ross.

On Tue, Jun 17, 2008 at 4:29 PM, Eric Lease Morgan <[log in to unmask]> wrote:
>
> On Jun 17, 2008, at 12:06 PM, siznax wrote:
>
>> the Zebralist appears to be relatively inactive.
>> anyone here have experience indexing MARC binaries
>> with zebra?
>
>
> I will try to outline here how to index (and search) MARC records using
> Zebra, but tweaking the indexing process is a bit trickier than I know how
> to do.
>
>  1. Install yaz, zebra, and all of their friends. I have found that the
> "standard" make process works pretty well, but allow yaz and zebra to
> specify where it puts various configuration files. The extra specification
> is not worth the effort.
>
>  2. Save your MARC records someplace on your file system. By "binary" MARC
> records, I suppose you mean "real" MARC records -- MARC records in
> communications format -- MARC records as the types of records fed to
> traditional integrated library systems. This is opposed to some flavor of
> XML or "tagged format" often used for display.
>
>  3. Create a zebra.cfg file, and have it look something like this:
>
>      # global paths
>      profilePath: .:./etc:/usr/local/share/idzebra-2.0/tab
>      modulePath: /usr/local/lib/idzebra-2.0/modules
>
>      # turn ranking on
>      rank: rank-1
>
>      # define a database of marc records called opac
>      opac.database: opac
>      opac.recordtype: grs.marcxml.marc21
>      attset: bib1.att
>      attset: explain.att
>
>  4. Index your MARC records with the following command. You should see lot's
> of great stuff sent to STDOUT.
>
>      zebraidx -g opac update <path to MARC records>
>
>
> You have now created your index. Once you get this far with indexing, you
> will want to tweak various .abs files (I think) to enhance the indexing
> process. This particular thing is not my forte. It seems like black magic to
> most of us. This is not a Zebra-specific problem; this is a problem with
> Z39.50.
>
> Next, you need to implement the client/server end of things:
>
>  5. Start your server. This will be a Z39.50 server -- a "kewl"
> library-centric protocol that existed before the Internet got hot:
>
>      zebrasrv localhost:9999 &
>
>  6. Use yaz-client to search your index:
>
>      & yaz-client
>      Z> open localhost:9999/opac
>      Z> find origami
>      Z> show 1
>      Z> quit
>
> Using the yaz-client almost requires a knowledge of Z39.50. Attached should
> be a Perl script that allows you to search your server in a bit more
> user-friendly way. To use it you will need to install a few Perl modules and
> then edit the constant called DATABASE.
>
> Even though Z39.50 is/was "kewl" it is still pretty icky. SRU is better --
> definitely a step in the right direction, and Zebra supports SRU out of the
> box. [1]
>
>  7. Create an an SRU configuration file looking something like this:
>
>     <yazgfs>
>       <server>
>         <config>zebra.cfg</config>
>         <cql2rpn>pqf.properties</cql2rpn>
>       </server>
>     </yazgfs>
>
>  8. Acquire a "better" pqf.properties file. PQF is about querying Z39.50
> databases. It is ugly. It was designed in a non-Internet world. Instead of
> knowing that 1=4 means search the title field, you want to simply search the
> title. Attached is a "better" pqf.properties file, and it is "better"
> because it maps things like 1=4 to Dublin Core equivalents. Save it in a
> directory called etc in the same directory as your zebra.cfg file. (Notice
> how the zebra.cfg file, above, denotes etc as being in zebra's path.)
>
>  9. Kill your presently running Z39.50 server.
>
>  10. Start up a SRU server:
>
>      zebrasrv -f sru.cfg localhost:9999 &
>
>  11. Use your HTTP client to search the SRU server. Queries will look like
> this (with carriage returns added for readability):
>
>      http://localhost:9999/opac?
>       operation=searchRetrieve&
>       version=1.1&
>       query=origami&
>       maximumRecords=5
>
> The result should be a stream of XML ready for XSLT processing.
>
> All of the above is almost exactly what I did to create an index of MARC
> records harvested from the Library of Congress and the University of
> Michigan's OAI data repository (MBooks). [2] Take a look at the HTML source.
> Notice how the client in this regard is only one HTML file containing a
> form, one CSS file for style, and one XSL file for XML to HTML
> transformation.
>
> HTH.
>
> [1] SRU - http://www.loc.gov/standards/sru/
> [2] Example SRU interface - http://infomotions.com/ii/
>
> --
> Eric Lease Morgan
> Head, Digital Access and Information Architecture Department
> Hesburgh Libraries, University of Notre Dame
>
> (574) 631-8604
>
>
>
>
>
>
>
>
>
>