To database or not to database (2 db || ~2 db), that is the question. Put another way, I am having a difficult time deciding to what degree I should use a database application to manage a collection of electronic texts. Allow me to explain. I host a collection of electronic texts called the Alex Catalogue. It needs a facelift in terms of both aesthetics and functionality. The collection consists of "great" public domain texts from American and English literature as well as Western philosophy. The idea behind the Catalogue is, if you were to read and understand all of these 500 or so items, then you would have a pretty good understanding of Western culture. Here are links to what I have so far: * http://infomotions.com/alex/ * http://infomotions.com/alex2/ Functionality-wise, a future implementation of the Catalogue will: * be accessible via authors, titles, a set of controlled vocabulary terms, as well as free-text searching * searches will return not only author, titles, and links, but also paragraph-level detail much like a concordance * search results will be sortable by author, title, date, rank, popularity, size, etc. * author names (the authority list) will be supplemented with rudimentary biographical information * controlled vocabulary terms will include things like subjects, literary form, genre, etc. * each document will ultimately be saved as a TEI/XML file, enabling me to transform the file(s) into a myriad of different forms such as HTML, "smart" HTML, plain text, PDF, PalmPilot, Rocket eBook, OEB, Newton Paperback, MARC, MARCXML, MODS, METS, etc. * provide a Search Inside The Book feature a la Amazon * provide a Did You Mean feature a la Google * allow harvesting via OAI * allow syndication of hand-selected and randomly-selected items through RSS * provide a MyAlex feature for customization/personalization * each item will be associated with one image to give the items' graphic appeal * the entire corpus with much of its functionality will be distributable on a CD but require no program to use -- just the CD and the data * items will be printable in such a way that they can be bound in a pretty manner To what degree do I use a database to implement these features? Maintaining an authority list and sets of controlled vocabulary terms almost necessitates a database application. Fine. No problem. I can accept that. But do I create database of the Catalogue's metadata and then point to the TEI files? Ick! That is too fragile, and IMHO not very elegant. Alternatively, I could store the entire TEI files into a database. It is not like the database can not handle the file size, but then the question is, "How do I do data-entry against the database?" Many of these texts are a few hundred K in size, and consequently not very amenable to CGI forms. Yet another approach would be to create my TEI files, use the filesystem as the database, and regularly crawl the filesystem to create indexes of various types. I suppose I could this using XSL technology. What do you think? What parts of a full-text catalog would you implement as a database application, and what parts would you not? -- Eric Lease Morgan University Libraries of Notre Dame