LISTSERV 16.5 - CODE4LIB Archives

Eric, why not use an XML db?
http://www.sleepycat.com/products/xml.shtml

-Ross.


Eric Lease Morgan wrote:

> To database or not to database (2 db || ~2 db), that is the question.
> Put another way, I am having a difficult time deciding to what degree I
> should use a database application to manage a collection of electronic
> texts. Allow me to explain.
>
> I host a collection of electronic texts called the Alex Catalogue. It
> needs a facelift in terms of both aesthetics and functionality. The
> collection consists of "great" public domain texts from American and
> English literature as well as Western philosophy. The idea behind the
> Catalogue is, if you were to read and understand all of these 500 or so
> items, then you would have a pretty good understanding of Western
> culture. Here are links to what I have so far:
>
>   * http://infomotions.com/alex/
>   * http://infomotions.com/alex2/
>
>
> Functionality-wise, a future implementation of the Catalogue will:
>
> * be accessible via authors, titles, a set of controlled vocabulary
> terms, as well as free-text searching
>
> * searches will return not only author, titles, and links, but also
> paragraph-level detail much like a concordance
>
> * search results will be sortable by author, title, date, rank,
> popularity, size, etc.
>
> * author names (the authority list) will be supplemented with
> rudimentary biographical information
>
> * controlled vocabulary terms will include things like subjects,
> literary form, genre, etc.
>
> * each document will ultimately be saved as a TEI/XML file, enabling me
> to transform the file(s) into a myriad of different forms such as HTML,
> "smart" HTML, plain text, PDF, PalmPilot, Rocket eBook, OEB, Newton
> Paperback, MARC, MARCXML, MODS, METS, etc.
>
> * provide a Search Inside The Book feature a la Amazon
>
> * provide a Did You Mean feature a la Google
>
> * allow harvesting via OAI
>
> * allow syndication of hand-selected and randomly-selected items
> through RSS
>
> * provide a MyAlex feature for customization/personalization
>
> * each item will be associated with one image to give the items'
> graphic appeal
>
> * the entire corpus with much of its functionality will be
> distributable on a CD but require no program to use -- just the CD and
> the data
>
> * items will be printable in such a way that they can be bound in a
> pretty manner
>
>
> To what degree do I use a database to implement these features?
> Maintaining an authority list and sets of controlled vocabulary terms
> almost necessitates a database application. Fine. No problem. I can
> accept that. But do I create database of the Catalogue's metadata and
> then point to the TEI files? Ick! That is too fragile, and IMHO not
> very elegant.
>
> Alternatively, I could store the entire TEI files into a database. It
> is not like the database can not handle the file size, but then the
> question is, "How do I do data-entry against the database?" Many of
> these texts are a few hundred K in size, and consequently not very
> amenable to CGI forms.
>
> Yet another approach would be to create my TEI files, use the
> filesystem as the database, and regularly crawl the filesystem to
> create indexes of various types. I suppose I could this using XSL
> technology.
>
> What do you think? What parts of a full-text catalog would you
> implement as a database application, and what parts would you not?
>
> --
> Eric Lease Morgan
> University Libraries of Notre Dame
>