Print

Print


thanks so much for your help Eric.

i was missing the attset statement in my zebra.cfg

 > attset: bib1.att

which is also missing from the idzebra-2.0 installed example,
/usr/share/idzebra-2.0-examples/oai-pmh/conf/zebra.cfg, but
perhaps is not necessary for the DOM XML record model.

i agree with Ross that your outline would be very
useful to have on the code4lib twiki.

also, my apologies, the zebralist _is_ active, and Adam
Dickmeiss provided an equivalently helpful just a bit
later.


[log in to unmask]



Eric Lease Morgan wrote:
> 
> On Jun 17, 2008, at 12:06 PM, siznax wrote:
> 
>> the Zebralist appears to be relatively inactive.
>> anyone here have experience indexing MARC binaries
>> with zebra?
> 
> 
> I will try to outline here how to index (and search) MARC records using 
> Zebra, but tweaking the indexing process is a bit trickier than I know 
> how to do.
> 
>   1. Install yaz, zebra, and all of their friends. I have found that the 
> "standard" make process works pretty well, but allow yaz and zebra to 
> specify where it puts various configuration files. The extra 
> specification is not worth the effort.
> 
>   2. Save your MARC records someplace on your file system. By "binary" 
> MARC records, I suppose you mean "real" MARC records -- MARC records in 
> communications format -- MARC records as the types of records fed to 
> traditional integrated library systems. This is opposed to some flavor 
> of XML or "tagged format" often used for display.
> 
>   3. Create a zebra.cfg file, and have it look something like this:
> 
>       # global paths
>       profilePath: .:./etc:/usr/local/share/idzebra-2.0/tab
>       modulePath: /usr/local/lib/idzebra-2.0/modules
> 
>       # turn ranking on
>       rank: rank-1
> 
>       # define a database of marc records called opac
>       opac.database: opac
>       opac.recordtype: grs.marcxml.marc21
>       attset: bib1.att
>       attset: explain.att
> 
>   4. Index your MARC records with the following command. You should see 
> lot's of great stuff sent to STDOUT.
> 
>       zebraidx -g opac update <path to MARC records>
> 
> 
> You have now created your index. Once you get this far with indexing, 
> you will want to tweak various .abs files (I think) to enhance the 
> indexing process. This particular thing is not my forte. It seems like 
> black magic to most of us. This is not a Zebra-specific problem; this is 
> a problem with Z39.50.
> 
> Next, you need to implement the client/server end of things:
> 
>   5. Start your server. This will be a Z39.50 server -- a "kewl" 
> library-centric protocol that existed before the Internet got hot:
> 
>       zebrasrv localhost:9999 &
> 
>   6. Use yaz-client to search your index:
> 
>       & yaz-client
>       Z> open localhost:9999/opac
>       Z> find origami
>       Z> show 1
>       Z> quit
> 
> Using the yaz-client almost requires a knowledge of Z39.50. Attached 
> should be a Perl script that allows you to search your server in a bit 
> more user-friendly way. To use it you will need to install a few Perl 
> modules and then edit the constant called DATABASE.
> 
> Even though Z39.50 is/was "kewl" it is still pretty icky. SRU is better 
> -- definitely a step in the right direction, and Zebra supports SRU out 
> of the box. [1]
> 
>   7. Create an an SRU configuration file looking something like this:
> 
>      <yazgfs>
>        <server>
>          <config>zebra.cfg</config>
>          <cql2rpn>pqf.properties</cql2rpn>
>        </server>
>      </yazgfs>
> 
>   8. Acquire a "better" pqf.properties file. PQF is about querying 
> Z39.50 databases. It is ugly. It was designed in a non-Internet world. 
> Instead of knowing that 1=4 means search the title field, you want to 
> simply search the title. Attached is a "better" pqf.properties file, and 
> it is "better" because it maps things like 1=4 to Dublin Core 
> equivalents. Save it in a directory called etc in the same directory as 
> your zebra.cfg file. (Notice how the zebra.cfg file, above, denotes etc 
> as being in zebra's path.)
> 
>   9. Kill your presently running Z39.50 server.
> 
>  10. Start up a SRU server:
> 
>       zebrasrv -f sru.cfg localhost:9999 &
> 
>  11. Use your HTTP client to search the SRU server. Queries will look 
> like this (with carriage returns added for readability):
> 
>       http://localhost:9999/opac?
>        operation=searchRetrieve&
>        version=1.1&
>        query=origami&
>        maximumRecords=5
> 
> The result should be a stream of XML ready for XSLT processing.
> 
> All of the above is almost exactly what I did to create an index of MARC 
> records harvested from the Library of Congress and the University of 
> Michigan's OAI data repository (MBooks). [2] Take a look at the HTML 
> source. Notice how the client in this regard is only one HTML file 
> containing a form, one CSS file for style, and one XSL file for XML to 
> HTML transformation.
> 
> HTH.
> 
> [1] SRU - http://www.loc.gov/standards/sru/
> [2] Example SRU interface - http://infomotions.com/ii/
> 
> 
> ------------------------------------------------------------------------
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>