LISTSERV 16.5 - CODE4LIB Archives

Author disambiguation is a tough one -- I don't think you'll find any
unique identifier and ORCID is not a viable method at this time. Email is
not a good identifier because authors change affiliations, are sometimes
known by more than one email at a single institution, and because this info
is not always available depending on where you get your data from.

Using some combination of names, email, affiliation, coauthors, journal,
topic, etc, you can probably improve accuracy, but it's going to be messy.

What is the use case you are trying to address? For example, the best
method may be very different if you're trying to disambiguate authors from
a single institution than if you're trying to solve a generic problem over
a huge corpus of data.

ISSN is also not super clean as a unique identifier even if it is very
useful -- single titles can have multiple ISSNs for different versions or
title changes that might not be perceived as different from people.

kyle


On Tue, Jul 9, 2013 at 8:32 AM, Paul Albert <[log in to unmask]> wrote:

> I am exploring methods for author disambiguation, and I would like to have
> access to one or more set of well-disambiguated data set containing:
> – a unique author identifier (email address, institutional identifier)
> – a unique article identifier (PMID, DOI, etc.)
> – a unique journal identifier (ISSN)
>
> Definition for "well-disambiguated" – for a given set of authors, you know
> the identity of their journal articles to a precision and recall of greater
> than 90-95%.
>
> Any ideas?
>
> thanks,
> Paul
>
>
> Paul Albert
> Project Manager, VIVO
> Weill Cornell Medical Library
> 646.962.2551
>