LISTSERV 16.5 - CODE4LIB Archives

A cautionary note.  Linked data works best when the entities identified by
URIs are unambiguous. That's not always the case with VIAF and ISNI.  They
aggregate data from other ID registries algorithmically and with limited
review.  High-performing algorithms still make mistakes, as do the more
manually built descriptions they depend on.

I did a search in VIAF on "morgan, eric" (which retrieves names matching
and associated with "eric morgan") and found matched records ISNI
0000000116490460 / VIAF 56843669, both of which conflate the physicist
Patricia Lewis, born 1957 per the LCNAF record where she's "Lewis, P. M.",
and lecturer on management Patricia Lewis, born 1963 per the LCNAF record
where she's "Lewis, Patricia, 1963-", and a "Lewis, Patricia, 1954-" who
appears to be a French Canadian author writing in French on emotions from a
self-help perspective, who has been confused with the management lecturer
who also writes on emotions from a management perspective.

I also noted ISNI 0000000384457106 / VIAF 275988911, both associated with
the management lecturer's LCNAF authority, so she effectively has two ISNI
and VIAF IDs.

And the first title cited in the LCNAF authority for physicist "Lewis, P.
M." is a document on the effects of shift work produced in Washington State
for the US Nuclear Regulatory Commission in 1985. That doesn't match with
the physicist Lewis's biography in Wikipedia.  A bit more poking around in
OCLC suggests this work belongs instead to Paul Michael Lewis, who has
contemporaneous works for the NRC on work schedules in the Pacific
Northwest. He's not properly established at all in LCNAF; nor is the French
Canadian author.

This kind of sifting of data is what takes time in authority work, and
what--when it's done well--makes authority data valuable for the semantic
web. The point of the above is NOT that algorithmic aggregation is a bad
idea--just that it leaves a lot of necessary work still to do. Cases like
the above are too easy to find at present in VIAF and ISNI. ISNI has been
very responsive for me when I've reported problems (and I will work to
resolve the problems noted above), but much remains to be done. Rather than
leaping ahead to more algorithmic matching of VIAF and ISNI IDs to other
identity records, I'd like to see developers work on programs which could
mine ISNI and VIAF to detect discrepancies in the aggregated ID sources for
further review.  That could reduce the proliferation of errors already in
the data records and make all of these data resources ultimately more
valuable for semantic web use.

Stephen





On Fri, Apr 15, 2016 at 10:57 AM, Kyle Banerjee <[log in to unmask]>
wrote:

> On Fri, Apr 15, 2016 at 2:16 AM, Eric Lease Morgan <[log in to unmask]> wrote:
>
> > ...
> > My questions are:
> >
> >   * What remote authority databases are available programmatically? I
> > already know of one from the Library of Congress, VIAF, and probably
> > WorldCat Identities. Does ISNI support some sort of API, and if so, where
> > is some documentation?
> >
>
> Depends on what you have in mind. For databases similar to your example, I
> believe ORCID has an API. GNIS, ULAN, CONA, and TGN might be interesting to
> you, but there are tons more, particularly if you add subject authorities
> (e.g. AAT, MeSH). The Getty stuff is all available as LoD.
>
>   * I believe the Library Of Congress, VIAF, and probably WorldCat
> > Identities all support linked data. Does ISNI, and if so, then how is it
> > implemented and can you point me to documentation?
> >
> >   * When it comes to updating the local (MARC) authority records, how do
> > you suggest the updates happen? More specifically, what types of values
> do
> > you suggest I insert into what specific (MARC) fields/subfields? Some
> > people advocate $0 of 1xx, 6xx, and 7xx fields. Other people suggest 024
> > subfields 2 and a. Inquiring minds would like to know.
> >
>
> Implementation would be specific to your system and those you wish to
> interact with. The MARC record is used to represent/transmit data, but it
> doesn't actually exist in the sense that systems use it internally as is.
>
> Having said that, I think the logical place to put control numbers from
> different schema is in 024 because that field allows you to differentiate
> the source so it doesn't matter if control numbers overlap
>
> kyle
>



-- 
Stephen Hearn, Metadata Strategist
Data Management & Access, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428
ORCID:  0000-0002-3590-1242