LISTSERV 16.5 - CODE4LIB Archives

Chuck Bearden wrote:

 > One could think of the XPath expressions pointing to retrievable
 > chunks of XML as analogous to database keys.  That's how I was viewing
 > them in my hypothetical (Lucene && (eXist || ThrowingBeans)) solution.

For one of my solutions using TEI documents and swish, the public
interface is designed to deliver "chapters" or "sections" of the work to
the public.  The "key" to the HTML page is the "ID" of the appropriate
<div> in TEI.

I wrote a routine in perl that simply extracted that part of the XML
document and fed it to swish-e with the appropriate ID anchoring the
Swishpath/URL string. (There was, in fact, an XML config file that told
the routine where to find the documents in the file system and supplied
a couple of other useful values)

If you
     give the paragraph tags an ID attribute,
     feed it to swish-e one unit at a time,
     set up the path as http://yoururl#p=[id attribute]
     add <a name="[id attribute]" /> inside your <p>s
then you should have a swish index with a path pointing to anchors
attached to individual paragraphs of the document.

One of the issues is that the weights for relevance will be different at
the paragraph level than at the article/chapter/section levels.  On the
other hand, this might not be a bad thing.

Walter Lewis
Halton Hills