Roy Tennant wrote:
> [snip] There may be other ways to leverage more information out of
> what we're
> indexing. For example, a number of journals have sections, such as "In
> Brief" from D-Lib Magazine [snip] It would of course take more work to
> both setup and maintain,
> but the result would be better.
I am reminded of a piece of advice Cliff Lynch offered at an Access
conference I attended in the early days of the web ('95 in Fredericton)
where he talked about the fundamental fragility of programs that
supplied web content by screen scraping vt100 interfaces. I've just
been looking at some commercial vendors who support federated searching
by plugging data into web forms and pulling the results into a frameset,
where others parse the results and give it your own "branding". It
looked suspiciously similar in approach to the solutions that Cliff was
deprecating almost a decade ago (and I'm sure others besides). The best
federated search results, IMHO, hang on standard search and result
protocols like Z39.50 where the underlying structure is abstracted into
standardized access points and published record syntax.
<disclosure>I spent the weekend working on a PHP/cURL project that does
essentially the same thing. Sometimes the Wrong Way is the only way to
get something to work without waiting for either the second coming and
the semantic web.</disclosure>
So what are the odds that the library literature will adopt a
standardized XML schema/dtd (or at least two or three) that will supply
some structure and context to the content? Is the answer indexing site
RSS feeds rather than the site itself -- and then bringing the fulltext
in behind the RSS? Obviously the range of metadata possibilities is
higher with some brands of RSS than with others.
Walter Lewis
Halton Hills
|