thanks so much for your help Eric. i was missing the attset statement in my zebra.cfg > attset: bib1.att which is also missing from the idzebra-2.0 installed example, /usr/share/idzebra-2.0-examples/oai-pmh/conf/zebra.cfg, but perhaps is not necessary for the DOM XML record model. i agree with Ross that your outline would be very useful to have on the code4lib twiki. also, my apologies, the zebralist _is_ active, and Adam Dickmeiss provided an equivalently helpful just a bit later. [log in to unmask] Eric Lease Morgan wrote: > > On Jun 17, 2008, at 12:06 PM, siznax wrote: > >> the Zebralist appears to be relatively inactive. >> anyone here have experience indexing MARC binaries >> with zebra? > > > I will try to outline here how to index (and search) MARC records using > Zebra, but tweaking the indexing process is a bit trickier than I know > how to do. > > 1. Install yaz, zebra, and all of their friends. I have found that the > "standard" make process works pretty well, but allow yaz and zebra to > specify where it puts various configuration files. The extra > specification is not worth the effort. > > 2. Save your MARC records someplace on your file system. By "binary" > MARC records, I suppose you mean "real" MARC records -- MARC records in > communications format -- MARC records as the types of records fed to > traditional integrated library systems. This is opposed to some flavor > of XML or "tagged format" often used for display. > > 3. Create a zebra.cfg file, and have it look something like this: > > # global paths > profilePath: .:./etc:/usr/local/share/idzebra-2.0/tab > modulePath: /usr/local/lib/idzebra-2.0/modules > > # turn ranking on > rank: rank-1 > > # define a database of marc records called opac > opac.database: opac > opac.recordtype: grs.marcxml.marc21 > attset: bib1.att > attset: explain.att > > 4. Index your MARC records with the following command. You should see > lot's of great stuff sent to STDOUT. > > zebraidx -g opac update <path to MARC records> > > > You have now created your index. Once you get this far with indexing, > you will want to tweak various .abs files (I think) to enhance the > indexing process. This particular thing is not my forte. It seems like > black magic to most of us. This is not a Zebra-specific problem; this is > a problem with Z39.50. > > Next, you need to implement the client/server end of things: > > 5. Start your server. This will be a Z39.50 server -- a "kewl" > library-centric protocol that existed before the Internet got hot: > > zebrasrv localhost:9999 & > > 6. Use yaz-client to search your index: > > & yaz-client > Z> open localhost:9999/opac > Z> find origami > Z> show 1 > Z> quit > > Using the yaz-client almost requires a knowledge of Z39.50. Attached > should be a Perl script that allows you to search your server in a bit > more user-friendly way. To use it you will need to install a few Perl > modules and then edit the constant called DATABASE. > > Even though Z39.50 is/was "kewl" it is still pretty icky. SRU is better > -- definitely a step in the right direction, and Zebra supports SRU out > of the box. [1] > > 7. Create an an SRU configuration file looking something like this: > > <yazgfs> > <server> > <config>zebra.cfg</config> > <cql2rpn>pqf.properties</cql2rpn> > </server> > </yazgfs> > > 8. Acquire a "better" pqf.properties file. PQF is about querying > Z39.50 databases. It is ugly. It was designed in a non-Internet world. > Instead of knowing that 1=4 means search the title field, you want to > simply search the title. Attached is a "better" pqf.properties file, and > it is "better" because it maps things like 1=4 to Dublin Core > equivalents. Save it in a directory called etc in the same directory as > your zebra.cfg file. (Notice how the zebra.cfg file, above, denotes etc > as being in zebra's path.) > > 9. Kill your presently running Z39.50 server. > > 10. Start up a SRU server: > > zebrasrv -f sru.cfg localhost:9999 & > > 11. Use your HTTP client to search the SRU server. Queries will look > like this (with carriage returns added for readability): > > http://localhost:9999/opac? > operation=searchRetrieve& > version=1.1& > query=origami& > maximumRecords=5 > > The result should be a stream of XML ready for XSLT processing. > > All of the above is almost exactly what I did to create an index of MARC > records harvested from the Library of Congress and the University of > Michigan's OAI data repository (MBooks). [2] Take a look at the HTML > source. Notice how the client in this regard is only one HTML file > containing a form, one CSS file for style, and one XSL file for XML to > HTML transformation. > > HTH. > > [1] SRU - http://www.loc.gov/standards/sru/ > [2] Example SRU interface - http://infomotions.com/ii/ > > > ------------------------------------------------------------------------ > > > > > > > > > > >