On May 13, 2004, at 10:23 AM, Walter Lewis wrote:
> For how many useful targets would it be possible to define a consistent
> intermediate layer structure that would:
>
> - handle a SRU/SRW search
> - transform it into an "native" database search
> - transform the results into an SRU/SRW friendly result set
>
> and still return them in a reasonable time?
>
> I'm not (necessarily) suggesting a centralized service that would do
> this (a la OCLC) but rather a set of protocols that I could drop into a
> locally managed site for targets that we choose to address in this
> fashion. Can the problem be abstracted sufficiently? Can we build in
> alerts to trigger actions when the structure of a given result doesn't
> match the pattern we've been expecting (i.e. site change alert)?
I am not able to answer the question of how many, but the algorithm
Walter outlines is exactly what SRW/U are designed to address.
As you many of you may or many not know, I've been playing with SRU
lately, and I've written the following text briefly describing it:
SRW and SRU in Five Hundred Words or Less
Introduction
Search and Retrieve Web Service (SRW) and Search and Retrieve URL
Service (SRU) are Web Services-based protocols for querying
databases and returning search results. SRW and SRU requests and
results are very similar. The difference between them lies in the
ways the queries and results are encapsulated and transmitted
between client and server applications. The canonical URL for SRW
and SRU is:
http://www.loc.gov/z3950/agency/zing/srw/
Basic "operations"
Both protocols define three and only three basic "operations":
explain, scan, searchRetrieve:
* explain - Explain operations are requests sent by clients as a
way of learning about the server's database. At a minimum,
responses to explain operations return the location of the
database, a description of what the database contains, and what
features of the protocol the server supports.
* scan - Scan operations are processes for enumerating the terms
found in the remote database's index. Clients send scan requests
and servers return lists of terms. The process is akin to
browsing a back-of-the-book index where a person looks up a term
in a book index and "scans" the entries surrounding the term.
* searchRetrieve - SearchRetrieve operations are the heart of the
matter. They provide the means to query the remote database and
return search results. Queries must be articulated using the
Common Query Language (CQL). CQL queries range from simple
freetext searches to complex Boolean operations with nested
queries and proximity qualifications. Servers do not have to
implement every aspect of CQL, but they have to know how to
return diagnostic messages when something is requested but not
supported. The results of searchRetrieve operations can be
returned in any number of formats, as specified via explain
operations. Examples might include structured but plain text
streams or data marked up in XML vocabularies such as Dublin
Core, RDF, MARCXML, etc.
Differences in operation
The differences between SRW and SRU lie in the way operations are
encapsulated and transmitted between client and server as well as
how results are returned. SRW is essentially as SOAP-ful Web
service. Operations are encapsulated by clients as SOAP requests
and sent to the server. Likewise, responses by servers are
encapsulated using SOAP and returned to clients. Since SOAP is
used in SRW, HTTP is not a necessary transport protocol.
On the other hand, SRU is essentially a REST-ful Web Service.
Operations are encoded as name/value pairs in the query string of
a URL. As such operations sent by SRU clients can only be
transmitted via HTTP GET requests. The result of SRU requests are
XML streams, the same streams returns via SRW requests sans the
SOAP envelope.
Summary
SRW and SRU are "brother and sister" standardized protocols for
accomplishing the task of querying databases and returning search
results. If index providers were to expose their services via SRW
and/or SRU, then access to these services would become more
ubiquitous.
I have also taken a stab at creating an SRU interface to a union list
of serials. It sport some fun features such as a Did You Mean? function
a la Google, as well as a suggestion function offering alternative
searches to try if you get too many hits. The underlying indexer is
swish-e. The interface to the index is written in Perl. Try searching
for 'computers in librariez' without the quotes:
http://dewey.library.nd.edu/morgan/sru/search.cgi
P.S. You will need a very modern browser to get human-readable output
from the interface since the raw XML sent to the user agent is expected
to be transformed with XSLT for display.
--
Eric Lease Morgan
University Libraries of Notre Dame
|