On Mar 3, 2004, at 6:12 PM, Roy Tennant wrote:
> At first I was wondering why you were complaining about the lack of
> metadata, since you were providing full-text searching, but then when I
> saw the search results I felt your pain....
This is just the sort of feedback I was hoping for. There are lots o'
good ideas here. Thanks.
RSS & metadata - In very brief discussions with the folks from DOAJ,
there were hopes of RSS feeds, and better metadata inside the document.
I will take advantage of these things when they become available, but I
won't hold my breath. If librarian types won't put the metadata in,
then who will? To be fair, I have not rigorously looked for metadata,
and it is possible to index/display it when available, and ignore when
not.
Lynch - Ironically, I was at that presentation by Mr. L, see:
http://infomotions.com/travel/gateways/.
Enhancing indexing - Yes, swish-e (my preferred indexer) can take input
the form of an HTML-like stream. This functionality allows me to index
things like the output of complex SELECT statements, and this is how I
provide the Search This Site functionality in MyLibrary. Alternatively,
I could write a Perl script to read my cached data, summarize it, and
sort of create my own metadata on the fly.
Enhancing search - Again, since swish-e is more like a tool
(programming library) as opposed to an application, it is possible to
capture a person's search before it is sent to the index. Upon
examining the query, we might be able to add some smarts to the query
and refine it. For example we could just let the query go and return
plain o' results. If the number of results is below a certain threshold
we could munge the query to increase retrieval through alternative
spellings or thesauri. If the number of results are above a certain
threshold, then we could limit results by reformulating the query as
title, phrase, or Search In Results sorts of searches.
Enhancing display - Google does not rely on metadata to display search
results. Instead all they have a title, some words in context
(concordance snippet), a URL, and a date. Since the content is being
cached locally, just like Google does, it may very well be possible to
grep out words in context and display them in the output. Furthermore,
I could associate URL's with journal titles and display the titles as
well.
Configuration - Finally, in the existing (barely functional) PHP
back-end, each title has a number of fields. Title. ISBN. URL. There
could just as easily be some sort of configuration section allowing the
indexing function (above) to know how to extract particular parts of
the content.
There are lots o' possibilities.
--
Eric
|