Well, we didn't end up doing it (although we still could).
When I look across the storage load that our asset management system is overseeing, metadata space pales in comparison to the original data file itself. Even access derivatives like display JPGs are tiny compared to their TIFF masters. WAV files are even bigger.
I agree that we shouldn't just assume disk is free, but when looking at the orders of magnitude of metadata to originals, I'd err on the side of keeping all the metadata.
Do you really feel that the cost of management of storage is going up? I do find that the bulk of the ongoing cost of digital asset management is in the people to manage the assets, but over time I'm seeing the management cost per asset drop as we need about the same number of people to run ten racks of storage as it takes to run two. And all of those racks are getting denser as storage media costs go down (Lord willin' and the creek don't flood. Again). I expect at some point the cost to store the assets in the cloud, rather than in local racks, will hit a sweet spot, and we'll move to that. We'll still need good management of the assets, but the policies it takes to track 300k assets will probably scale to millions, especially if the metadata is stored in a very accessible, linkable way.
D
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Mark Jordan
Sent: Tuesday, December 06, 2011 10:51 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Models of MARC in RDF
Well said Will,
Mark
----- Original Message -----
> This is a *very* tangential rant, but it makes me mental when I hear
> people say the "'disk space' is no longer an issue." While it's true
> that the costs of disk drives continue to drop, my experience is that
> the cost of managing storage and backups is rising almost
> exponentially as libraries continue to amass enormous quantities of
> digital data and metadata. Again, I recognize that text files are a
> small portion of our library storage these days, but to casually
> suggest that doubling any amount of data storage is an inconsiderable
> consideration strikes me as the first step down a dangerous path.
> Sorry for the interruption to an interesting thread.
>
> Will
>
>
>
> On 12/6/11 10:44 AM, "Karen Coyle" <[log in to unmask]> wrote:
>
> >Quoting "Fleming, Declan" <[log in to unmask]>:
> >
> >>Hi - I'll note that the mapping decisions were made by our metadata
> >>services (then Cataloging) group, not by the tech folks making it
> >>all work, though we were all involved in the discussions. One idea
> >>that came up was to do a, perhaps, lossy translation, but also stuff
> >>one triple with a text dump of the whole MARC record just in case we
> >>needed to grab some other element out we might need. We didn't do
> >>that, but I still like the idea. Ok, it was my idea. ;)
> >
> >I like that idea! Now that "disk space" is no longer an issue, it
> >makes good sense to keep around the "original state" of any data that
> >you transform, just in case you change your mind. I hadn't thought
> >about incorporating the entire MARC record string in the
> >transformation, but as I recall the average size of a MARC record is
> >somewhere around 1K, which really isn't all that much by today's
> >standards.
|