So just out of curiosity I ran 2500 author names from the DOAJ through this library (I used a version someone has kindly wrapped as a webservice The names were just some I had handy so no real attempt to challenge the software.

In general it seemed to do pretty well, but it isn’t perfect. In particular two part given names or two part family names where the parts are separated by a space end up with part of the name in the ‘middle name’. This may not matter too much to you in cases where this affects the given name, because you’ll end up with the same output string if you format as {family name}, {first name} {middle name}. However in cases where the surname is split by a space (as it is for my kids) then you end up with a problem - e.g.:

Jane Bloggs Doe - where the surname is ‘Bloggs Doe’, would end up being converted to: Doe, Jane Bloggs instead of Bloggs Doe, Jane

I tend to use OpenRefine to do this kind of work and this allows you to do lookups on webservices such as the one I’ve used - so this is a pretty useful addition to my toolset - thanks for asking the question!


Owen Stephens
Owen Stephens Consulting
Email: [log in to unmask]
Telephone: 0121 288 6936

> On 13 Jan 2017, at 11:34, Timothy Hill <[log in to unmask]> wrote:
> Please excuse the naive way this question is formulated: I'm sure the
> Information & Library Science community has formal terms for what I'm
> attempting to do, but unfortunately I don't know what they are.
> The problem I'm trying to solve is that I have a bunch of author names (for
> example, 'Charles Dickens') that I need to reformat into standard catalogue
> order ('Dickens, Charles'). Obviously the example given is trivial, but of
> course this can get quite complex depending on the addition of titles and
> honorifics.
> Is anyone aware of a software library to perform this kind of conversion?
> The programming language used is not terribly important, though Java or
> Python would be preferable.
> In ideal world the library would deal with the different conventions used
> in different languages and by different institutions - but anything would
> be better than the current split-on-comma approach I'm using right now.
> Thanks,
> Timothy Hill