LISTSERV 16.5 - CODE4LIB Archives

Firstly - thanks for the thoughtful replies, links, and anecdotes.

We end up storing a lot of MODS as text in a database.
We map it out as other formats...but our app deals in *a lot* of mods.

A lot of time and line-count is dedicated to turning XML into an
object/datastructure
that can be sent to-and-from web forms in a way our web app likes..and
because that
object/datastructure is atypical we forego the benefits (like first-class
validation) of our framework.
Not to mention querying gets hinky in XML and dealing with remediation
within a
hierarchy means updating what amounts to a denormalized cache.
(http://martinfowler.com/bliki/TwoHardThings.html)

It's hard to dissuade myself from the idea that we're simply hanging
adjectives on nouns (our objects)
and that different specs map these adjectives to different words and format
them differently.
*I think other projects store attributes in a traditional relational way
and concoct*
*different specs based on DB records. (Maybe Archivist's Toolkit?
Archivespace?)*

Uff - anyway - maybe I'll get a chance to describe a collection's objects
in a spec-agnostic way
I already can imagine peppering the schema with spec-specific columns and
it being a slippery
slope from there. But hey, dream big - right?

I may reply to this thread with my success story one day.
I'm also really eager to share if it goes totally wrong.
Those stories are usually more entertaining.


Stephen


On Fri, Apr 17, 2015 at 6:56 PM, Cary Gordon <[log in to unmask]> wrote:

> This is a beautiful response, and the payoff, at the end, is perfect... Try
> it, you probably won't like it. In practice, even with big hardware,
> relational databases get mired down with MARC and MODS once the collection
> size becomes significant..
>
> Cary
>
> On Friday, April 17, 2015, Mark V. Sullivan <
> [log in to unmask]> wrote:
>
> > Stephen,
> > As the lead developer on the SobekCM open-source digital repository
> > project and formerly a developer for the University of Florida
> Libraries, I
> > have looked at this quite a bit and learned a bit over time.
> >
> > I began development working on tracking systems to manage a fairly
> > large-scale digitization shop at UF before I was even working on the
> public
> > repository side.  When I arrived (around 1999) metadata was double keyed
> > several times for each item during the tracking and metadata creation
> > process.  It seemed obvious to me that we needed a tracking system and
> one
> > that would hold metadata for each item.  This was fairly easy to do when
> > our metadata was very homogenous and based on simple Dublin Core.  This
> > worked well and the system could easily spit out ready METS (and MXF)
> > packages.
> >
> > Over time, I began to experiment with MODS and increasingly started using
> > specialized metadata schemas for different types of objects, such as
> > herbarium or oral history materials.  I envisioned a tracking system that
> > would hold all of this metadata relationally and provide different tabs
> > based on the material type.  So, oral history items would have an extra
> tab
> > exposing the oral history metadata and herbarium would have a similar
> > special tab.  While development of this moved ahead, the entire system
> > seemed unwieldy.  Adding a new schema was a bit laborious.. even adding a
> > new field to use.
> >
> > After several years of this, we began the SobekCM digital repository
> > software development.  After that experience I swore off trying to store
> > very complex structured data in the database in the same type of format.
> > (This may also have had to do with an IMLS project I worked on that
> proved
> > the futility of this approach.)  I generally eschew triple-stores for the
> > basis of libraries in favor of relational databases on the premise that
> we
> > DO actually understand the basic relationships of digital resources to
> > collection and the sub-relations there.  We keep the data within METS
> files
> > with one or more descriptive metadata sections and essentially the
> database
> > only points to that METS file.  For searching, we use a flattened table
> > structure with one row per item, much like Solr/Lucene, and Solr/Lucene
> > itself.
> >
> > My advice is to steer clear of trying to take beautifully (and deeply)
> > structured metadata from MODS, Darwin Core, VRACore (and who knows what
> > else) and try to create tables and relations for them.
> >
> > I think you can point some database tools at the schema and have it
> > generate the tables for you.  Just doing that will probably dissuade you.
> > ;)
> >
> > Mark V. Sullivan
> > CIO & Application Architect
> > Sobek Digital Hosting and Consulting, LLC
> > [log in to unmask] <javascript:;>
> > 352-682-9692 (mobile)
> >
> >
> > ________________________________________
> > From: Code for Libraries <[log in to unmask] <javascript:;>> on
> > behalf of Stephen Schor <[log in to unmask] <javascript:;>>
> > Sent: Friday, April 17, 2015 1:27 PM
> > To: [log in to unmask] <javascript:;>
> > Subject: [CODE4LIB] Modeling a repository's objects in a relational
> > database
> >
> > Hullo.
> >
> > I'm interested to hear about people's approaches for modeling
> > repository objects in a normalized, spec-agnostic way, _relational_ way
> > while
> > maintaining the ability to cast objects as various specs (MODS, Dublin
> > Core).
> >
> > People often resort to storing an object as one specification (the text
> of
> > the MODS for example),
> > and then convert it other specs using XSLT or their favorite language,
> > using established
> > mappings / conversions. (
> > http://www.loc.gov/standards/mods/mods-conversions.html)
> >
> > Baking a MODS representation into a database text field can introduce
> > problems with queryablity and remediation that I _feel_ would be hedged
> > by factoring out information from the XML document, and modeling it
> > in a relational DB.
> >
> > This is idea that's been knocking around in my head for a while.
> > I'd like to hear if people have gone down this road...and I'm especially
> > eager to hear both success and horror stories about what kind of results
> > they got.
> >
> > Stephen
> >
>
>
> --
> Cary Gordon
> The Cherry Hill Company
> http://chillco.com
>