On 11/28/06, Erik Hatcher <[log in to unmask]> wrote: > Is there a standard for specifying how textual analysis works as > well, so that tokenization can be standardized across these XQuery > engines as well? Not that I know. What I've seen so far is that tokenization is implementation specific. Perhaps this is something that is configurable so that implementations can be set up and then queried consistently. Any indexing engine worth its salt should be configurable I'd think. There is nothing I'm aware of in the fulltext work though that defines how things are indexed. > That's an easy bet... of course Lucene will be part of it. It's > already implemented as extensions to XQuery engines (Nux, I know of, > and surely others). As you can tell, I'm not really a gambler :-) Our native XML database vendor has committed to the fulltext spec (once it becomes a spec) and since they are using Lucene already I'd say I don't have anything to worry about. Interestingly, as a side note, a quick search turned up an eXist presentation from Prague06 saying that eXist's text analysis classes would be replaced by a "modular analyzer provided by Apache's Lucene." Neat. All this talk is just me looking forward (with optimism). It is possible to use fulltext with XQuery now either through an intermediary layer like we currently have (Lucene search is done and the results passed to XQuery and our native XML database for retrieval and munging) or by creating fulltext extensions (like eXist db and our native XML database vendor have done). Personally, I wish we had taken the extension route, but it was just quicker for me to do something in Java and have the search and XQ servlets chain rather than adding the extra extension layer through our XQuery processor. Quicker isn't always better/cleaner/nicer though... Kevin