LISTSERV 16.5 - CODE4LIB Archives

Sure, but these are especially small photos -- are these born digital?

Lossless scans of pretty small photos are frequently well over 100MB, and
it takes hardly anything to get a 1GB scan. It costs a fortunate to treat a
thesis (i.e. what's really being preserved is readable text rather than the
artifact itself) as if it were a historical document where texture of the
paper and the like is actually relevant.

The problem we encounter is that people want to scan a tiny photo and blow
it up to poster or wall size. But the source material and the equipment are
typically far more limiting factors than the format unless you're doing
something that's totally nuts. You want maximum flexibility for
unanticipated future needs, but value judgments need to be made up front or
you wind up painting yourself into a corner.

kyle


On Sun, Apr 28, 2013 at 12:40 PM, Simon Spero <[log in to unmask]> wrote:

> On Sun, Apr 28, 2013 at 2:43 AM, Kyle Banerjee <[log in to unmask]
> >wrote:
>
> Every choice people make is about loss. Equipment, optics, lighting,
> > you name it. But for some reason, the instant we're talking about bits of
> > data
> > on a disk, people plan as though capacity were unlimited when most
> > archives are severely underresourced.
> >
>
>  Strictly speaking, it is not correct to say that every choice is about
> loss (or cost), and for once I'm saying this in a case where the difference
> is actually significant. [Someone help edsu to the fainting couch.]
>
> If a particular set of choices are below the Production Possibility
> Frontier<
> http://en.wikipedia.org/wiki/Production%E2%80%93possibility_frontier>,
> then those choices are strictly inferior to those that are on the Frontier.
>  Why is this relevant?  Because, for the situation where lossless image
> storage has a very value,  TIFF is not the most space efficient way of
> storing the data.
>
> A month or so ago I did a few measurements, using a (not necessarily
> representative) color photograph (TIFF extracted from a Canon EOS-10D raw).
>
>
> For lossless conversion, I used uncompressed TIFF, compressed TIFF, PNG,
> and JP2 (100% quality).   Measurements using the ImageMagick "compare"
> utility  confirmed zero signal loss:
>
> -rw-r--r--@ 1 ses  staff    18M Mar 19 14:52
> CRW_4237_tiff_8_uncompressed.tif
> -rw-r--r--@ 1 ses  staff   9.4M Mar 19 14:53
> CRW_4237_tiff_8_compressed.tif
> -rw-r--r--  1 ses  staff   8.2M Mar 19 14:29 CRW_4237-0.png
> -rw-r--r--@ 1 ses  staff   6.1M Mar 19 14:03 CRW_4237_quality_100-0.jp2
>
> For lossy compression, using RMSE as the metric, we can see that JPEG at
> 90% quality is showing measurable signal degradation, with a compression
> ratio of 4.7:1  relative to the JP2 file (vs. 14:1 relative to uncompressed
> tiff, and 7.2:1 for compressed).
>
> $compare  ... CRW_4237_jpg_90.jpg =        459.806 (0.00701619) [1.3M]
> (4.7:1)
>
> JP2 at quality 75 showed slightly less signal loss by RMSE, with a
> compression ratio of 5.5 : 1
>
> $compare  ... CRW_4237_quality_75.jp2 =    457.959 (0.006988)   [1.1M]
> (5.5:1)
>
> Note that the image type was a color photograph; other image types may get
>  better lossless compression using PNG or TIFF.  Also, some people have
> expressed concern over the use of JP2 for archival purposes due to a
> relatively small number of open-source libraries.  On the other hand, JP2
> has some potentially useful properties for distributed replicated
> preservation (layers with fine levels of detail could be split off and
>  stored on fewer replicas).
>
> Simon
>