On Oct 27, 2005, at 2:06 PM, Andrew Nagy wrote:
> I have been thinking of ways, similiar to what you have done that you
> mentioned below with the Ockham project, to allow more modern day
> with our library catalog. I have been beginning to think about
> a way to index/harvest our entire catalog (and allow this indexing
> process to run every so often) to allow our own custom access methods.
> We could then generate our own custom RSS feeds of new books, allow
> efficient/enticing search interfaces, etc.
> Do you know of any existing software for indexing or harvesting a
> catalog into another datastore (SQL Database, XML Database, etc).
> I am
> sure I could fetch all of the records somehow through Z39.50 and
> dump it
> into a MySQL database, but maybe there is some better method?
I too have thought about harvesting content from my local catalog and
providing new interfaces to the content, and I might go about this in
a number of different ways.
1. I might use OAI to harvest the content, cache is locally, and
provide services against the cache. This cache might be saved on a
file system, but more likely into a relational database.
2. I might simply dump all the MARC records from my catalog,
transform them into something more readable, say sets of HTML/XML
records, and provide services against these files.
The weakest link in my chain would be my indexer. Relational
databases are notoriously ill-equipped to handle free text searching.
Yes, you can implement it and you can use various database-specific
features to implement free text searching, but they still won't work
as well as an indexer. My only experience with indexers lies in
things like swish-e and Plucene. I sincerely wonder whether or not
these indexers would be up to the task.
Supposing I could find/use an indexer that was satisfactory, I would
then provide simple and advanced (SRU/OpenSearch) search features
against the index of holdings. Search results would then be enhanced
with the features such as borrow, re-new, review, put on reserve,
save as citation, email, "get it for me", put on hold, "what's new?",
view as RSS, etc. These services would require a list of authorized
users of the system -- a patron database.
In short, since I would have direct access to the data, and since I
would have direct to the index, I would use my skills to provide
services them. For the most part, I don't mind back-end,
administrative, data-entry interfaces to our various systems, but I
do have problems with the end-user interfaces. Let me use those back-
ends to create and store my data, then give me unfettered access to
the data and I will provide my own end-user interfaces. Another
alternative is to exploit (industry standard) Web Services computing
techniques against the existing integrated library system. In this
way you get XML data (information without presentation) back and you
can begin to do the same things.
Eric Lease Morgan
University Libraries of Notre Dame