Print

Print



Bill,

At Michigan, all large high-res files--in fact, all source files--are
retained long-term for preservation purposes.  In fact, where possible, we
build the system around the source files rather than derivatives in order
to simplify maintenance and ensure the integrity of the source.  This
strategy involves replication across multiple systems (and of course
storage) wherever possible.

We store the high-res files in a variety of places, including DLT, CD-ROM
(gold, with ISO 9660 file naming conventions), and again where possible,
disk (RAID).  I should add that we are able to do this in most cases,
including for our Preservation-oriented page image conversion activities,
though not yet for continuous tone images.  In the case of the
Preservation activities, we convert and mount several million pages a
year, and Moore's Law has been our friend in this regard for several
years.  We buy disk as we need it (or, actually, just before we need it),
and have been able to keep the costs of disk purchases *and* maintenance
down while performance very high.  We have not invested in hierarchical
storage systems.

In the case of our continuous tone imaging operations, the relatively
large number (thousands per year) of images we create at over 120Mb each
has precluded our ability to take the same sort of approach, but we are
hopeful about trends in the area of imaging, and particularly JPEG2000.
What we imagine might be possible here is storing a lossless compressed
continuous tone image as part of the access system, and (because the
compression is lossless) being able to combine the access and master
versions.  We're only in the exploratory phases of this idea, however.

We do not compress masters, in general, though we do create derivatives of
our continuous tone images to put them online.  We are using Mr. Sid's
wavelet compression for the continuous tone images in the online system,
but store the uncompressed masters on gold CD-ROM using ISO 9660 file
naming conventions.

Finally, let me add that we're still in the process of developing a
taxonomy of locally-stored digital files with associated responsibilities.
Some material we serve up to the campus is not uniquely held by us, and in
fact may be transitory.  Other material is unique and even in this context
we may have more or less responsibility for the long-term maintenance.
This sort of taxonomy will figure into our future storage decisions.

We have several terabytes of RAID online, and currently buy exclusively
hardware RAID 5 subsystems for our production server, and in particular
have just installed two extremely cost-effective ($7.50/GB) SCSI-IDE
hardware RAIDs with which we are very happy. However, given our strategy
of keeping source materials on spinning disk (as opposed to nearline tape
storage or something similar) we have grown to the point where just adding
direct-attached storage like this and managing multiple filesystems is no
longer scalable. We're currently planning to migrate to storage appliances
beginning this year to solve this problem. Because these systems are
either NAS or SAN based, this strategy has the added benefit of making it
very simple to bring up as many web servers as we want in a single
location to handle increased load and/or protect against web server
outages, which is another critical goal.


---------------------------------------------------------------------------
John Price Wilkin                             Phone:  734.764.8016
Interim Associate Director                    Fax:    734.763.5080
Digital Library Services, University Library  email:  [log in to unmask]
818 Hatcher South                             http://www.lib.umich.edu/dls/
University of Michigan
Ann Arbor, MI 48109-1205

On Fri, 10 Jan 2003, Bill Britten wrote:

> Colleagues,
>
> Here at Tennessee we seem to be expanding the amount of digital data at an exponential rate. A TB of storage lasted about a year. We are buying another 1.3 TB, knowing it may only get us through another 6 months. And there is the corresponding need for our backup system to keep up with all of this storage. This email is a reality-check for us, if you could let us know the state of digital storage at your institution.
>
> For digital projects, are master files (i.e. large high-res files)
> retained long-term for preservation purposes?
>
> If so, are these files stored on disk, or written off to CD, DVD, tape?
>
> Are you compressing these files ... using what?
>
> What level of storage do you currently maintain, and what are the immediate plans for purchase?
>
> thanks so much,
>
> Bill Britten
> Professor and Head, Library Systems
> 647 Hodges Library, UTK, Knoxville, TN 37996-1000
> voice: 865-974-1082 fax: 865-974-0626
> [log in to unmask]
>
>