Print

Print


On August 22 ya'aQovwrote:
> *Lars, so the wrong spelling of that Swedish author is based on your
> browsing it, not on an automated procedure, or reference to an online
> thesaurus. Given your Swedish resources, is there any quality control
> mechanism you can suggest?

This author's name was just one that I stumbled upon.
My problem is that I stumble on bad spelling of titles
and authors' names far too often, like 5% of all cases.
I guess this is because they are now exposed, when
Google and Hathi Trust pull all data together. Earlier
these errors have been isolated in various library
catalogs that had no users who speak the language
(in this case: Swedish). In each library catalog these
Swedish entries make up a tiny minority. But now with
the aggregation in Hathi Trust and Worldcat, we can
start to see patterns and fix the errors.

For Swedish books, the national catalog at libris.kb.se
is most often correct. It's also available as open data
under cc0 (Creative Commons zero)
http://www.kb.se/libris/teknisk-information/Oppen-data/Open-Data/
and the Libris authority file is part of VIAF.

You'd need one good reference per language.

But I think you can do a good analysis even without
knowing much about a language. If you find that records
for books in Czech often contain č (c-hacek), but almost
never ĉ (c-circumflex), you can look at the records that
contain the unusual letter and see if they are errors
or perhaps intentional uses of Esperanto words.


-- 
   Lars Aronsson ([log in to unmask])
   Project Runeberg - free Nordic literature - http://runeberg.org/