Godmar,
We're using eXist for a couple of apps here, and like it quite a bit.
The full text search extensions in the 1.4 release are backed by Lucene,
and it's pretty quick once you've tuned it (try some searches here:
http://diglib.princeton.edu/ead/ -- this is running on a beta of 1.4)
and set up the indexing properly. Performance will not be good until
you've configured some indexes and tweaked the JVM settings. There is a
bit of a learning curve involved here, but the documentation is decent,
and the community and developers are quite active and accessible.
You can GET and PUT and DELETE documents very easily, or POST xqueries
to get fragments. You can also GET fragments or documents by supplying
parameters to an xquery stored in the database--they call this their
"REST-style API"[1]. There are a few other ways to get content in and
out[2], and Java integration isn't a problem via the xml:db API[3]. You
can also write extension modules in Java.
-Jon
1. http://exist.sourceforge.net/devguide_rest.html
2. http://exist.sourceforge.net/devguide.html
3. http://exist.sourceforge.net/devguide_xmldb.html
On 01/16/2010 11:15 AM, Godmar Back wrote:
> Hi,
>
> we're currently looking for an XML database to store a variety of
> small-to-medium sized XML documents. The XML documents are
> unstructured in the sense that they do not follow a schema or DTD, and
> that their structure will be changing over time. We'll need to do
> efficient searching based on elements, attributes, and full text
> within text content. More importantly, the documents are mutable.
> We'll like to bring documents or fragments into memory in a DOM
> representation, manipulate them, then put them back into the database.
> Ideally, this should be done in a transaction-like manner. We need to
> efficiently serve document fragments over HTTP, ideally in a manner
> that allows for scaling through replication. We would prefer strong
> support for Java integration, but it's not a must.
>
> Have other encountered similar problems, and what have you been using?
>
> So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
> Base-X (http://www.basex.org/ ), MonetDB/XQuery
> (http://www.monetdb.nl/XQuery/ ), Sedna
> (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
> others here: http://en.wikipedia.org/wiki/XML_database
> I'm wondering to what extent systems such as Lucene, or even digital
> object repositories such as Fedora could be coaxed into this usage
> scenario.
>
> Thanks for any insight you have or experience you can share.
>
> - Godmar
>
--
Jon Stroop
Metadata Analyst
C-17-D2 Firestone Library
Princeton University
Princeton, NJ 08544
Email: [log in to unmask]
Phone: (609)258-0059
Fax: (609)258-0441
http://diglib.princeton.edu
http://diglib.princeton.edu/ead
|