Princeton Digital Collections site
Princeton University Library is pleased to announce the creation of a
new site for our growing local digital collections:
http://diglib1.princeton.edu
We invite our peers to provide commentary on our prototype web site,
which is currently in beta. Please send comments to Marvin Bielawski
<[log in to unmask]>. Over the coming weeks we will be working to
improve usability. We expect a full launch soon afterward, and hope
that peer commentary will help improve the end product
The site will eventually contain content created in previous digital
projects here at Princeton, and will also contain content being newly
created on an ongoing basis in our Digital Imaging Studio.
Our new Digital Collections website leverages XML and XML-related
technologies to make these collections available over the Web. To
start, descriptive metadata, in a variety of formats (VRA, MODS, TEI,
and EAD), is encapsulated in a METS wrapper and ingested by an import
program. This allows images, mostly produced in our Digital Imaging
Studio, to be associated with item and collection level metadata.
The METS records are stored in X-Hive, a native XML database, and
searched
through X-Hive's extension to Lucene, a Java-based search engine.
X-Hive's
extension to Lucene most notably adds support for XQuery, a SQL-like
programming language used to query XML.
The website uses Tomcat as its Java servlet engine. There is,
essentially, a
three tiered structure: the lowest level is the database and the Java
classes and servlets used to provide connections and pooling, the middle
level is the XQuery servlet and the XQueries themselves (these provide
most
of the site's functionality), and the upper level is the XSLT servlet
and
stylesheets that format the XML, returned from the XQueries, into HTML.
The architecture also contains a conversion program, written in Java,
that
batch converts TIFF images, the archival format produced in the digital
studio, to JPEG2000 images, the format used by the website's Aware JP2
image
server. Once the images are in JP2 format, there is an image navigation
Java Bean that allows the XQuery scripts to perform image manipulation,
to
create thumbnails, and to generate JPEG images that are viewable in a
standard Web browser. These images are represented by URLs in the XML
that
is returned. The patron's browser then downloads the images directly
from
the Aware JP2 image server.
Most other parts of the website's functionality are provided by the
XQuery
scripts. The images and text that appear on the front page, for
instance,
are pulled, via XQuery, from the database. The searching, and filtering
of
search results, is performed by an XQuery script; the page turning,
thumbnail generation, and individual image navigation are also all
results
of an XQuery script.
All XQuery scripts return XML that is then transformed, through XSLT
stylesheets, into HTML. There is a minimal amount of Javascript which
adds
some functionality to the page turning and search features.
The initial site has just a few representative samples of content from
recent projects, but many more images will be added as soon as they are
wrapped in the appropriate metadata structures.
The site was created by the collaborative efforts of the staff of our
Digital Library Operations Group.
Princeton will also be examining ways of integrating features and
concepts
from the Fedora package into our native XML database approach within the
coming months to create a uniform, extensible architecture.
Marvin Bielawski <[log in to unmask]>
|