Print

Print


On May 13, 2004, at 10:23 AM, Walter Lewis wrote:

> For how many useful targets would it be possible to define a consistent
> intermediate layer structure that would:
>
>    - handle a SRU/SRW search
>    - transform it into an "native" database search
>    - transform the results into an SRU/SRW friendly result set
>
> and still return them in a reasonable time?
>
> I'm not (necessarily) suggesting a centralized service that would do
> this (a la OCLC) but rather a set of protocols that I could drop into a
> locally managed site for targets that we choose to address in this
> fashion.  Can  the problem be abstracted sufficiently? Can we build in
> alerts to trigger actions when the structure of a given result doesn't
> match the pattern we've been expecting (i.e. site change alert)?

I am not able to answer the question of how many, but the algorithm
Walter outlines is exactly what SRW/U are designed to address.

As you many of you may or many not know, I've been playing with SRU
lately, and I've written the following text briefly describing it:


   SRW and SRU in Five Hundred Words or Less

   Introduction

   Search and Retrieve Web Service (SRW) and Search and Retrieve URL
   Service (SRU) are Web Services-based protocols for querying
   databases and returning search results. SRW and SRU requests and
   results are very similar. The difference between them lies in the
   ways the queries and results are encapsulated and transmitted
   between client and server applications. The canonical URL for SRW
   and SRU is:

     http://www.loc.gov/z3950/agency/zing/srw/


   Basic "operations"

   Both protocols define three and only three basic "operations":
   explain, scan, searchRetrieve:

   * explain - Explain operations are requests sent by clients as a
        way of learning about the server's database. At a minimum,
        responses to explain operations return the location of the
        database, a description of what the database contains, and what
        features of the protocol the server supports.

   * scan - Scan operations are processes for enumerating the terms
        found in the remote database's index. Clients send scan requests
        and servers return lists of terms. The process is akin to
        browsing a back-of-the-book index where a person looks up a term
        in a book index and "scans" the entries surrounding the term.

   * searchRetrieve - SearchRetrieve operations are the heart of the
        matter. They provide the means to query the remote database and
        return search results. Queries must be articulated using the
        Common Query Language (CQL). CQL queries range from simple
        freetext searches to complex Boolean operations with nested
        queries and proximity qualifications. Servers do not have to
        implement every aspect of CQL, but they have to know how to
        return diagnostic messages when something is requested but not
        supported. The results of searchRetrieve operations can be
        returned in any number of formats, as specified via explain
        operations. Examples might include structured but plain text
        streams or data marked up in XML vocabularies such as Dublin
        Core, RDF, MARCXML, etc.


   Differences in operation

   The differences between SRW and SRU lie in the way operations are
   encapsulated and transmitted between client and server as well as
   how results are returned. SRW is essentially as SOAP-ful Web
   service. Operations are encapsulated by clients as SOAP requests
   and sent to the server. Likewise, responses by servers are
   encapsulated using SOAP and returned to clients. Since SOAP is
   used in SRW, HTTP is not a necessary transport protocol.

   On the other hand, SRU is essentially a REST-ful Web Service.
   Operations are encoded as name/value pairs in the query string of
   a URL. As such operations sent by SRU clients can only be
   transmitted via HTTP GET requests. The result of SRU requests are
   XML streams, the same streams returns via SRW requests sans the
   SOAP envelope.


   Summary

   SRW and SRU are "brother and sister" standardized protocols for
   accomplishing the task of querying databases and returning search
   results. If index providers were to expose their services via SRW
   and/or SRU, then access to these services would become more
   ubiquitous.


I have also taken a stab at creating an SRU interface to a union list
of serials. It sport some fun features such as a Did You Mean? function
a la Google, as well as a suggestion function offering alternative
searches to try if you get too many hits. The underlying indexer is
swish-e. The interface to the index is written in Perl. Try searching
for 'computers in librariez' without the quotes:

   http://dewey.library.nd.edu/morgan/sru/search.cgi

P.S. You will need a very modern browser to get human-readable output
from the interface since the raw XML sent to the user agent is expected
to be transformed with XSLT for display.

--
Eric Lease Morgan
University Libraries of Notre Dame