On Mar 3, 2004, at 6:12 PM, Roy Tennant wrote: > At first I was wondering why you were complaining about the lack of > metadata, since you were providing full-text searching, but then when I > saw the search results I felt your pain.... This is just the sort of feedback I was hoping for. There are lots o' good ideas here. Thanks. RSS & metadata - In very brief discussions with the folks from DOAJ, there were hopes of RSS feeds, and better metadata inside the document. I will take advantage of these things when they become available, but I won't hold my breath. If librarian types won't put the metadata in, then who will? To be fair, I have not rigorously looked for metadata, and it is possible to index/display it when available, and ignore when not. Lynch - Ironically, I was at that presentation by Mr. L, see: http://infomotions.com/travel/gateways/. Enhancing indexing - Yes, swish-e (my preferred indexer) can take input the form of an HTML-like stream. This functionality allows me to index things like the output of complex SELECT statements, and this is how I provide the Search This Site functionality in MyLibrary. Alternatively, I could write a Perl script to read my cached data, summarize it, and sort of create my own metadata on the fly. Enhancing search - Again, since swish-e is more like a tool (programming library) as opposed to an application, it is possible to capture a person's search before it is sent to the index. Upon examining the query, we might be able to add some smarts to the query and refine it. For example we could just let the query go and return plain o' results. If the number of results is below a certain threshold we could munge the query to increase retrieval through alternative spellings or thesauri. If the number of results are above a certain threshold, then we could limit results by reformulating the query as title, phrase, or Search In Results sorts of searches. Enhancing display - Google does not rely on metadata to display search results. Instead all they have a title, some words in context (concordance snippet), a URL, and a date. Since the content is being cached locally, just like Google does, it may very well be possible to grep out words in context and display them in the output. Furthermore, I could associate URL's with journal titles and display the titles as well. Configuration - Finally, in the existing (barely functional) PHP back-end, each title has a number of fields. Title. ISBN. URL. There could just as easily be some sort of configuration section allowing the indexing function (above) to know how to extract particular parts of the content. There are lots o' possibilities. -- Eric