LISTSERV 16.5 - CODE4LIB Archives

You might consider a NoSQL database, either memory (redis, etc.) or disk based (MongoDB, etc.) depending on your needs. There are also triple-store specific DBs like SparkleDB.

Cary

> On Apr 17, 2015, at 5:01 PM, Stephen Schor <[log in to unmask]> wrote:
> 
> Firstly - thanks for the thoughtful replies, links, and anecdotes.
> 
> We end up storing a lot of MODS as text in a database.
> We map it out as other formats...but our app deals in *a lot* of mods.
> 
> A lot of time and line-count is dedicated to turning XML into an
> object/datastructure
> that can be sent to-and-from web forms in a way our web app likes..and
> because that
> object/datastructure is atypical we forego the benefits (like first-class
> validation) of our framework.
> Not to mention querying gets hinky in XML and dealing with remediation
> within a
> hierarchy means updating what amounts to a denormalized cache.
> (http://martinfowler.com/bliki/TwoHardThings.html)
> 
> It's hard to dissuade myself from the idea that we're simply hanging
> adjectives on nouns (our objects)
> and that different specs map these adjectives to different words and format
> them differently.
> *I think other projects store attributes in a traditional relational way
> and concoct*
> *different specs based on DB records. (Maybe Archivist's Toolkit?
> Archivespace?)*
> 
> Uff - anyway - maybe I'll get a chance to describe a collection's objects
> in a spec-agnostic way
> I already can imagine peppering the schema with spec-specific columns and
> it being a slippery
> slope from there. But hey, dream big - right?
> 
> I may reply to this thread with my success story one day.
> I'm also really eager to share if it goes totally wrong.
> Those stories are usually more entertaining.
> 
> 
> Stephen
> 
> 
> On Fri, Apr 17, 2015 at 6:56 PM, Cary Gordon <[log in to unmask]> wrote:
> 
>> This is a beautiful response, and the payoff, at the end, is perfect... Try
>> it, you probably won't like it. In practice, even with big hardware,
>> relational databases get mired down with MARC and MODS once the collection
>> size becomes significant..
>> 
>> Cary
>> 
>> On Friday, April 17, 2015, Mark V. Sullivan <
>> [log in to unmask]> wrote:
>> 
>>> Stephen,
>>> As the lead developer on the SobekCM open-source digital repository
>>> project and formerly a developer for the University of Florida
>> Libraries, I
>>> have looked at this quite a bit and learned a bit over time.
>>> 
>>> I began development working on tracking systems to manage a fairly
>>> large-scale digitization shop at UF before I was even working on the
>> public
>>> repository side.  When I arrived (around 1999) metadata was double keyed
>>> several times for each item during the tracking and metadata creation
>>> process.  It seemed obvious to me that we needed a tracking system and
>> one
>>> that would hold metadata for each item.  This was fairly easy to do when
>>> our metadata was very homogenous and based on simple Dublin Core.  This
>>> worked well and the system could easily spit out ready METS (and MXF)
>>> packages.
>>> 
>>> Over time, I began to experiment with MODS and increasingly started using
>>> specialized metadata schemas for different types of objects, such as
>>> herbarium or oral history materials.  I envisioned a tracking system that
>>> would hold all of this metadata relationally and provide different tabs
>>> based on the material type.  So, oral history items would have an extra
>> tab
>>> exposing the oral history metadata and herbarium would have a similar
>>> special tab.  While development of this moved ahead, the entire system
>>> seemed unwieldy.  Adding a new schema was a bit laborious.. even adding a
>>> new field to use.
>>> 
>>> After several years of this, we began the SobekCM digital repository
>>> software development.  After that experience I swore off trying to store
>>> very complex structured data in the database in the same type of format.
>>> (This may also have had to do with an IMLS project I worked on that
>> proved
>>> the futility of this approach.)  I generally eschew triple-stores for the
>>> basis of libraries in favor of relational databases on the premise that
>> we
>>> DO actually understand the basic relationships of digital resources to
>>> collection and the sub-relations there.  We keep the data within METS
>> files
>>> with one or more descriptive metadata sections and essentially the
>> database
>>> only points to that METS file.  For searching, we use a flattened table
>>> structure with one row per item, much like Solr/Lucene, and Solr/Lucene
>>> itself.
>>> 
>>> My advice is to steer clear of trying to take beautifully (and deeply)
>>> structured metadata from MODS, Darwin Core, VRACore (and who knows what
>>> else) and try to create tables and relations for them.
>>> 
>>> I think you can point some database tools at the schema and have it
>>> generate the tables for you.  Just doing that will probably dissuade you.
>>> ;)
>>> 
>>> Mark V. Sullivan
>>> CIO & Application Architect
>>> Sobek Digital Hosting and Consulting, LLC
>>> [log in to unmask] <javascript:;>
>>> 352-682-9692 (mobile)
>>> 
>>> 
>>> ________________________________________
>>> From: Code for Libraries <[log in to unmask] <javascript:;>> on
>>> behalf of Stephen Schor <[log in to unmask] <javascript:;>>
>>> Sent: Friday, April 17, 2015 1:27 PM
>>> To: [log in to unmask] <javascript:;>
>>> Subject: [CODE4LIB] Modeling a repository's objects in a relational
>>> database
>>> 
>>> Hullo.
>>> 
>>> I'm interested to hear about people's approaches for modeling
>>> repository objects in a normalized, spec-agnostic way, _relational_ way
>>> while
>>> maintaining the ability to cast objects as various specs (MODS, Dublin
>>> Core).
>>> 
>>> People often resort to storing an object as one specification (the text
>> of
>>> the MODS for example),
>>> and then convert it other specs using XSLT or their favorite language,
>>> using established
>>> mappings / conversions. (
>>> http://www.loc.gov/standards/mods/mods-conversions.html)
>>> 
>>> Baking a MODS representation into a database text field can introduce
>>> problems with queryablity and remediation that I _feel_ would be hedged
>>> by factoring out information from the XML document, and modeling it
>>> in a relational DB.
>>> 
>>> This is idea that's been knocking around in my head for a while.
>>> I'd like to hear if people have gone down this road...and I'm especially
>>> eager to hear both success and horror stories about what kind of results
>>> they got.
>>> 
>>> Stephen
>>> 
>> 
>> 
>> --
>> Cary Gordon
>> The Cherry Hill Company
>> http://chillco.com
>>