Hi Paul,

I guess this rather depends on your purposes. From the way you've asked the
question, it sounds like you are looking for a control set of data to
compare your own efforts of author disambiguation to (rather than simply
having good sources of disambiguated data - presumably for feeding into a
VIVO instance)?

In case you haven't looked at it already, you might find the Profiles RNS
Disambiguation Engine useful:

Although this would just cover Medline/PubMed data. This could work just as
a source of disambiguated data for you, not just as a control set for your
own implementations.

Additionally, if you are not adverse to using other software to provide
disambiguated data, rather than implementing your own solution, then you
might want to look at research information management software (e.g.
Symplectic Elements). These specialise in acquiring data from a number of
data sources (including PubMed), and helping you create a clean,
disambiguated set of publication data - and typically provide APIs that
allow you to interact and/or extract that information for re-use in other


On 9 July 2013 16:32, Paul Albert <[log in to unmask]> wrote:

> I am exploring methods for author disambiguation, and I would like to have
> access to one or more set of well-disambiguated data set containing:
>  a unique author identifier (email address, institutional identifier)
>  a unique article identifier (PMID, DOI, etc.)
>  a unique journal identifier (ISSN)
> Definition for "well-disambiguated"  for a given set of authors, you know
> the identity of their journal articles to a precision and recall of greater
> than 90-95%.
> Any ideas?
> thanks,
> Paul
> Paul Albert
> Project Manager, VIVO
> Weill Cornell Medical Library
> 646.962.2551