This is a beautiful response, and the payoff, at the end, is perfect... Try
it, you probably won't like it. In practice, even with big hardware,
relational databases get mired down with MARC and MODS once the collection
size becomes significant..
Cary
On Friday, April 17, 2015, Mark V. Sullivan <
[log in to unmask]> wrote:
> Stephen,
> As the lead developer on the SobekCM open-source digital repository
> project and formerly a developer for the University of Florida Libraries, I
> have looked at this quite a bit and learned a bit over time.
>
> I began development working on tracking systems to manage a fairly
> large-scale digitization shop at UF before I was even working on the public
> repository side. When I arrived (around 1999) metadata was double keyed
> several times for each item during the tracking and metadata creation
> process. It seemed obvious to me that we needed a tracking system and one
> that would hold metadata for each item. This was fairly easy to do when
> our metadata was very homogenous and based on simple Dublin Core. This
> worked well and the system could easily spit out ready METS (and MXF)
> packages.
>
> Over time, I began to experiment with MODS and increasingly started using
> specialized metadata schemas for different types of objects, such as
> herbarium or oral history materials. I envisioned a tracking system that
> would hold all of this metadata relationally and provide different tabs
> based on the material type. So, oral history items would have an extra tab
> exposing the oral history metadata and herbarium would have a similar
> special tab. While development of this moved ahead, the entire system
> seemed unwieldy. Adding a new schema was a bit laborious.. even adding a
> new field to use.
>
> After several years of this, we began the SobekCM digital repository
> software development. After that experience I swore off trying to store
> very complex structured data in the database in the same type of format.
> (This may also have had to do with an IMLS project I worked on that proved
> the futility of this approach.) I generally eschew triple-stores for the
> basis of libraries in favor of relational databases on the premise that we
> DO actually understand the basic relationships of digital resources to
> collection and the sub-relations there. We keep the data within METS files
> with one or more descriptive metadata sections and essentially the database
> only points to that METS file. For searching, we use a flattened table
> structure with one row per item, much like Solr/Lucene, and Solr/Lucene
> itself.
>
> My advice is to steer clear of trying to take beautifully (and deeply)
> structured metadata from MODS, Darwin Core, VRACore (and who knows what
> else) and try to create tables and relations for them.
>
> I think you can point some database tools at the schema and have it
> generate the tables for you. Just doing that will probably dissuade you.
> ;)
>
> Mark V. Sullivan
> CIO & Application Architect
> Sobek Digital Hosting and Consulting, LLC
> [log in to unmask] <javascript:;>
> 352-682-9692 (mobile)
>
>
> ________________________________________
> From: Code for Libraries <[log in to unmask] <javascript:;>> on
> behalf of Stephen Schor <[log in to unmask] <javascript:;>>
> Sent: Friday, April 17, 2015 1:27 PM
> To: [log in to unmask] <javascript:;>
> Subject: [CODE4LIB] Modeling a repository's objects in a relational
> database
>
> Hullo.
>
> I'm interested to hear about people's approaches for modeling
> repository objects in a normalized, spec-agnostic way, _relational_ way
> while
> maintaining the ability to cast objects as various specs (MODS, Dublin
> Core).
>
> People often resort to storing an object as one specification (the text of
> the MODS for example),
> and then convert it other specs using XSLT or their favorite language,
> using established
> mappings / conversions. (
> http://www.loc.gov/standards/mods/mods-conversions.html)
>
> Baking a MODS representation into a database text field can introduce
> problems with queryablity and remediation that I _feel_ would be hedged
> by factoring out information from the XML document, and modeling it
> in a relational DB.
>
> This is idea that's been knocking around in my head for a while.
> I'd like to hear if people have gone down this road...and I'm especially
> eager to hear both success and horror stories about what kind of results
> they got.
>
> Stephen
>
--
Cary Gordon
The Cherry Hill Company
http://chillco.com
|