LISTSERV 16.5 - CODE4LIB Archives

Expanding on Diane's "enough for what?", I see a lot of gradations
possible in this task.

Minimally, I can establish entity records for three persons, each of
whom is assigned a unique ID:
  John Smith, ID=123
  John Smith, ID=456
  John Smith, ID=789
I decide to do this because I have three resources, each created by a
John Smith, and judgment leads me to believe that they're all
different people. If my identities are each linked by ID to their own
resource, then one could argue that the above metadata is enough; but
that takes for granted a set of additional relationships maintained on
resource records which provide the actual differentiating metadata.
(In its beginnings, the LC Name Authority File worked a lot like this,
depending on access to the LC bib file for proper identification. One
might need to look up an LC bib record to determine, say, what field
"Smith, John" worked in, since the authority found "Study of the
impact of ..., 1972" to be enough.)

But suppose the identity records are meant to carry more of their  own
weight. Copying the titles associated with the differentiated entities
into their identity records as relationship information goes some way
in this direction; but we're still depending on an implicit judgment
by the entity record creator that these are distinct persons. A
computer would have a hard time making this sort of determination just
from words in titles. So maybe we bring some more metadata from our
resource records into the entity records--date and place of
publication, publisher, co-creator names, etc. These kinds of metadata
might give us a bit more confidence in doing automated evaluation of
whether two entities are different, and whether either should be
associated with a new resource by "John Smith."

But if we really want confident differentiation, we need metadata that
varies distinctly between individuals--birth date, for instance, or
full names (John Arthur Smith vs. John Baxter Smith), city of birth,
etc. We've had these sorts of things in LCNAF headings for just this
reason. Now under the influence of RDA, MARC and future entity
description models are moving toward clearly marking these kinds of
potentially distinguishing facts in the metadata about the entity.
But note that while such facts may be enough for distinguishing two
persons, they may be useless for determining whether one of them
created a particular resource. For that, facts about each person's
work and works may be much more useful. Those facts too are getting
more attention in RDA and data models oriented toward faceted
information about entities.

So, while it's "enough" to have a trusted colleague's assertion that
John Smith 123 and John Smith 456 are different persons, access to
additional metadata about the two Smiths could bring such
determinations within the reach of machine logic, and could support
decisions about how each one should be related to other entities and
to resources. OCLC's current work on VIAF, determining which
authorities can be clustered and which shouldn't be, and past work on
aggregating information about creators from bib data potentially to
guide name heading assignment for additional bib records, demonstrate
the extent to which the metadata currently in national authority files
support these two metadata-driven tasks.

Having said all that, in the absence of clearly differentiating facts
and within a single authority system, I'd still put my faith in the
entity description creator's judgment that this John Smith and that
John Smith are different, as expressed in the creation of two entity
descriptions with distinct IDs. In a well managed system, the unique
ID is itself a crucial assertion on which all the other metadata
depends.

Stephen


On Fri, Feb 10, 2012 at 4:23 PM, Diane Hillmann
<[log in to unmask]> wrote:
> Patrick:
>
> I can only ask: enough for what?  If you haven't a solid idea of what you
> want the metadata to do, it's hard to evaluate either quantity or quality.
>
> Metadata is not static--if it's not regularly evaluated, improved and added
> to, it tends to lose its value and usefulness over time.
>
> Diane
>
> On Fri, Feb 10, 2012 at 4:27 PM, Ethan Gruber <[log in to unmask]> wrote:
>
>> An interface is only as useful as the metadata allows it to be, and the
>> metadata is only as useful as the interface built to take advantage of it.
>>
>> Ethan
>>
>> On Fri, Feb 10, 2012 at 4:10 PM, David Faler <[log in to unmask]>
>> wrote:
>>
>> > I think the answer is make sure you are able to add new elements to the
>> > store later, and keep around your source data and plan to be able to
>> > reprocess it.  Something like what XC is doing.  That way, you get to be
>> > agile at the beginning and just deal with what you *know* is absolutely
>> > needed, and add more when you can make a business case for it.
>>  Especially
>> > if you are looking to deal with MARC or ONIX data.
>> >
>> > On Fri, Feb 10, 2012 at 3:57 PM, Patrick Berry <[log in to unmask]> wrote:
>> >
>> > > So, one question I forgot to toss out at the Ask Anything session is:
>> > >
>> > > When do you know you have enough metadata?
>> > >
>> > > "You'll know it when you have it," isn't the response I'm looking for.
>> >  So,
>> > > I'm sure you're wondering what the context for this question is, and
>> > > honestly there is none.  This is geared towards contentDM or DSpace or
>> > > Omeka or Millennium.  I've seen groups not plan enough for collecting
>> > data
>> > > and I've seen groups that are have been planning so long they forgot
>> what
>> > > they were supposed to be collecting in the first place.
>> > >
>> > > So, I'll just throw that vague question out there and see who wants to
>> > take
>> > > a swing.
>> > >
>> > > Thanks,
>> > > Pat/@pberry
>> > >
>> >
>>



-- 
Stephen Hearn, Metadata Strategist
Technical Services, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428