I'll second Bob's recommendation on that paper.
I've found the following paper to be an interesting read on the topic of metadata quality and some of the ways that we could approach measuring it with automation.
Automatic Evaluation of Metadata Quality in Digital Repositories by Xavier Ochoa and Erik Duval
From: Code for Libraries <[log in to unmask]> on behalf of Robert Sandusky <[log in to unmask]>
Sent: Wednesday, May 6, 2015 4:42 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] How to measure quality of a record
I recommend this article as an entry point into a research program on
Stvilia, B., Gasser, L., Twidale, M. B. and Smith, L. C. (2007), A
framework for information quality assessment. J. Am. Soc. Inf. Sci., 58:
1720–1733. doi:10.1002/asi.20652 Available at:
One cannot manage information quality (IQ) without first being able to
measure it meaningfully and establishing a causal connection between the
source of IQ change, the IQ problem types, the types of activities
affected, and their implications. In this article we propose a general
IQ assessment framework. In contrast to context-specific IQ assessment
models, which usually focus on a few variables determined by local
needs, our framework consists of comprehensive typologies of IQ
problems, related activities, and a taxonomy of IQ dimensions organized
in a systematic way based on sound theories and practices. The framework
can be used as a knowledge resource and as a guide for developing IQ
measurement models for many different settings. The framework was
validated and refined by developing specific IQ measurement models for
two large-scale collections of two large classes of information objects:
Simple Dublin Core records and online encyclopedia articles.
On 5/6/2015 4:32 PM, Diane Hillmann wrote:
> You might try this blog post, by Thomas Bruce, who was my co-author on an
> earlier article (referred to in the post):
> On Wed, May 6, 2015 at 5:24 PM, Kyle Banerjee <[log in to unmask]>
>>> On May 6, 2015, at 7:08 AM, James Morley <[log in to unmask]>
>>> I think a key thing is to determine to what extent any definition of
>> 'completeness' is actually a representation of 'quality'. As Peter says,
>> making sure not just that metadata is present but then checking it conforms
>> with rules is a big step towards this.
>> Basing quality measures too much on the presence of certain data points or
>> the volume of data is fraught with peril. In experiments in the distant
>> past, my experience was that looking for structure and syntax patterns that
>> indicate good/bad quality as well as considering record sources was useful.
>> Also keep in mind that any scoring system is to some extent arbitrary, so
>> you don't want to read more into what it generates than appropriate.