On Wed, 15 Dec 2004 14:28:51 -0500, Clay Redding <[log in to unmask]> wrote: > Hi Eric, > > Not necessarily. If you're up for trying PostgreSQL, their XML > functionality works *really* well. > > http://www.throwingbeans.org/tech/postgresql_and_xml.html [...] > Eric Lease Morgan wrote: > > > Again, thank you for the prompt reply. > > > > Actually, the implementation you suggest was the path I was > > considering. Unfortunately, this means leaving some sort of text lying > > around on my file system for importing. Similarly, it poses the problem > > of editing; in order to edit data in such an implementation I will need > > to export the big chunk, edit, and re-import. That is sort of klunky, > > but still it is what I was considering. Let me also mention eXist[1] as an XML database that supports XPath and XQuery. Throwing Beans mentions some of eXist's advantages in a posting from September [2]. In my mind, searching large swaths of full-text is rather different than searching structured metadata, especially on controlled vocabulary index points. One idea that comes to mind is using either the Throwing Beans/PostgreSQL or eXist solutions for storage and XPath-based access, and creating full-text indices with Lucene[3]. (I mention Lucene because I recall reading a few months back that eXist and Sleepycat were optimized for XPath queries and so might not perform adequately in full-text searches.) In developing a Lucene index of a corpus of TEI texts, I suspect that for each indexed chunk (chapter, section, paragraph) you could include the XPath expression pointing to that chunk as a thing that could be returned. Do full-text searches against Lucene, but pull actual text to be transformed from eXist, using XPath expressions returned from Lucene. Browsable representations of document structures would, I think, also be easier to get from eXist or PgSQL/Throwing Beans than from Lucene. You could probably just store the documents in eXist, and spit them out temporarily to let Lucene index them. I am not sure that Lucene would permit you to selectively re-index corrected documents, though. You might have spit out and re-index all the docs after posting a corrected version back to eXist. I suspect that a system like this could also be made to work for collections of EAD finding aids, though perhaps they have less need of heavy-duty full-text searching than literary or philosophical texts. Just a few thoroughly untried ideas! Chuck [1] <http://exist.sourceforge.net/> [2] <http://www.throwingbeans.org/tech/xml_databases_with_exist_and_coldfusion.html#000048> [3] <http://jakarta.apache.org/lucene/docs/index.html>