Dear Code4Lib community,
I'm moving on from my summer project--creating an EAD-to-VuFind conversion
program that allows one to index EAD files for a standard VuFind/Solr
implementation--and want to make my project available to you in case anyone
can benefit from it.
A brief summary:
This summer, I worked on a couple of projects for DAIAD—the Digital Access
and Information Architecture Department (Hesburgh Library, Notre Dame). One
of DAIAD’s ongoing projects is the Catholic Research Resources Alliance
(more informally, the “Catholic Portal”)—a VuFind index on a Solr server
that stores rare or unique resources from a number of institutions that in
some way pertain to the academic field of Catholic studies (especially
American Catholic Studies). VuFind currently only works with MARC records,
though it has the capacity to be upgraded to work with other kinds of
records. My task for the CRRA was to code a program in Java to convert EAD
files to the VuFind schema; after conversion, I had to send them to a Solr
server that uses the VuFind schema, as a trial run before sending them to
the CRRA server.
As my tools, I used Eclipse on Windows XP. Because the Solr/VuFind
installation I wanted to use was on a server and not my hard drive, and
because it only accepted commands originating on said server, I had to
deploy my code to the server to test it. I used an ANT build script to copy
my to make my Eclipse-based code to the Linux server, where it could be run
from the Linux command line. I also used properties files to contain
environment specific information like the address of each Solr installation
and the path to the EAD files on the current machine—properties files which
the build script partially configured for me.
The result is rough, but it works. If everything is set up properly (e.g.,
.dtd file can be found by every EAD file, .properties files contain the
right paths, etc.), CRRASolrIndexer parses a series of EAD files, maps their
data into objects representing VuFind records, and sends the records to a
Solr server. They can then be searched for through the VuFind interface.
(VuFind’s display of EAD records is also very rough at this point, but
improvements are in the works.)
I will no longer be maintaining this code, but it’s available (with more
documentation) if anyone wants to use it at
http://code.google.com/p/crrasolrindexer/ .
Take care,
Stephen Little
University of Notre Dame
LinkedIn profile: http://www.linkedin.com/in/stephenmalittle
|