Hi Paul, I guess this rather depends on your purposes. From the way you've asked the question, it sounds like you are looking for a control set of data to compare your own efforts of author disambiguation to (rather than simply having good sources of disambiguated data - presumably for feeding into a VIVO instance)? In case you haven't looked at it already, you might find the Profiles RNS Disambiguation Engine useful: http://profiles.catalyst.harvard.edu/docs/ProfilesRNS_DisambiguationEngine.pdf Although this would just cover Medline/PubMed data. This could work just as a source of disambiguated data for you, not just as a control set for your own implementations. Additionally, if you are not adverse to using other software to provide disambiguated data, rather than implementing your own solution, then you might want to look at research information management software (e.g. Symplectic Elements). These specialise in acquiring data from a number of data sources (including PubMed), and helping you create a clean, disambiguated set of publication data - and typically provide APIs that allow you to interact and/or extract that information for re-use in other systems. Regards, G On 9 July 2013 16:32, Paul Albert <[log in to unmask]> wrote: > I am exploring methods for author disambiguation, and I would like to have > access to one or more set of well-disambiguated data set containing: > – a unique author identifier (email address, institutional identifier) > – a unique article identifier (PMID, DOI, etc.) > – a unique journal identifier (ISSN) > > Definition for "well-disambiguated" – for a given set of authors, you know > the identity of their journal articles to a precision and recall of greater > than 90-95%. > > Any ideas? > > thanks, > Paul > > > Paul Albert > Project Manager, VIVO > Weill Cornell Medical Library > 646.962.2551 >