Eric, why not use an XML db? http://www.sleepycat.com/products/xml.shtml -Ross. Eric Lease Morgan wrote: > To database or not to database (2 db || ~2 db), that is the question. > Put another way, I am having a difficult time deciding to what degree I > should use a database application to manage a collection of electronic > texts. Allow me to explain. > > I host a collection of electronic texts called the Alex Catalogue. It > needs a facelift in terms of both aesthetics and functionality. The > collection consists of "great" public domain texts from American and > English literature as well as Western philosophy. The idea behind the > Catalogue is, if you were to read and understand all of these 500 or so > items, then you would have a pretty good understanding of Western > culture. Here are links to what I have so far: > > * http://infomotions.com/alex/ > * http://infomotions.com/alex2/ > > > Functionality-wise, a future implementation of the Catalogue will: > > * be accessible via authors, titles, a set of controlled vocabulary > terms, as well as free-text searching > > * searches will return not only author, titles, and links, but also > paragraph-level detail much like a concordance > > * search results will be sortable by author, title, date, rank, > popularity, size, etc. > > * author names (the authority list) will be supplemented with > rudimentary biographical information > > * controlled vocabulary terms will include things like subjects, > literary form, genre, etc. > > * each document will ultimately be saved as a TEI/XML file, enabling me > to transform the file(s) into a myriad of different forms such as HTML, > "smart" HTML, plain text, PDF, PalmPilot, Rocket eBook, OEB, Newton > Paperback, MARC, MARCXML, MODS, METS, etc. > > * provide a Search Inside The Book feature a la Amazon > > * provide a Did You Mean feature a la Google > > * allow harvesting via OAI > > * allow syndication of hand-selected and randomly-selected items > through RSS > > * provide a MyAlex feature for customization/personalization > > * each item will be associated with one image to give the items' > graphic appeal > > * the entire corpus with much of its functionality will be > distributable on a CD but require no program to use -- just the CD and > the data > > * items will be printable in such a way that they can be bound in a > pretty manner > > > To what degree do I use a database to implement these features? > Maintaining an authority list and sets of controlled vocabulary terms > almost necessitates a database application. Fine. No problem. I can > accept that. But do I create database of the Catalogue's metadata and > then point to the TEI files? Ick! That is too fragile, and IMHO not > very elegant. > > Alternatively, I could store the entire TEI files into a database. It > is not like the database can not handle the file size, but then the > question is, "How do I do data-entry against the database?" Many of > these texts are a few hundred K in size, and consequently not very > amenable to CGI forms. > > Yet another approach would be to create my TEI files, use the > filesystem as the database, and regularly crawl the filesystem to > create indexes of various types. I suppose I could this using XSL > technology. > > What do you think? What parts of a full-text catalog would you > implement as a database application, and what parts would you not? > > -- > Eric Lease Morgan > University Libraries of Notre Dame >