Roy Tennant wrote: > [snip] There may be other ways to leverage more information out of > what we're > indexing. For example, a number of journals have sections, such as "In > Brief" from D-Lib Magazine [snip] It would of course take more work to > both setup and maintain, > but the result would be better. I am reminded of a piece of advice Cliff Lynch offered at an Access conference I attended in the early days of the web ('95 in Fredericton) where he talked about the fundamental fragility of programs that supplied web content by screen scraping vt100 interfaces. I've just been looking at some commercial vendors who support federated searching by plugging data into web forms and pulling the results into a frameset, where others parse the results and give it your own "branding". It looked suspiciously similar in approach to the solutions that Cliff was deprecating almost a decade ago (and I'm sure others besides). The best federated search results, IMHO, hang on standard search and result protocols like Z39.50 where the underlying structure is abstracted into standardized access points and published record syntax. <disclosure>I spent the weekend working on a PHP/cURL project that does essentially the same thing. Sometimes the Wrong Way is the only way to get something to work without waiting for either the second coming and the semantic web.</disclosure> So what are the odds that the library literature will adopt a standardized XML schema/dtd (or at least two or three) that will supply some structure and context to the content? Is the answer indexing site RSS feeds rather than the site itself -- and then bringing the fulltext in behind the RSS? Obviously the range of metadata possibilities is higher with some brands of RSS than with others. Walter Lewis Halton Hills