Print

Print


Outside the library sector, the most common approach to language tagging
and matching isn't ISO-639-2 or ISO-639-3, rather BCP-47.

Quite a number of ISO-639-2 language tags represent what ISO-639-3 refers
to as macro languages. For instance 'kar' in ISO-639-2 resolves to 20
language codes in ISO-639-3

But ISO-639-3 by itself isn't sufficient to fully identify a written
language.

Eg you could have sr-Cyrl for Serbian in the Cyrillic script. Sr-Latn to
represent Serbian written in the Latin orthography, sr-Latn-alalc97 ...
Romanised Cyrillic Serbian based on the ALA-LC Cyrillic romanisation table
published in 1997.

Its worth noting the only ALA-LC romanisation tables that can be specified
in BCP-47 are the 1997 editions.

Ultimately it is what a library is working on, if you are cataloguing then
all you have is ISO-639-3/B

If you are working on a digitisation or linked data project it is much
better to correctly use BCP-47 which would align your resources more
accurately with the rest of the broader information ecosystem in which your
resources would exist.

Andrew
On 2 Jun 2016 9:15 am, "Craig Franklin" <[log in to unmask]> wrote:

> We've never had any problems sticking to ISO639-2 codes (in cases there
> isn't a shorter ISO639-1 code available).  I'm interested in what sort of
> regional languages you might be dealing with where there are significant
> gaps in that standard?
>
> You might also look at ISO 639-3, which is quite comprehensive but also
> introduces a fair chunk of complexity:
>
> http://www-01.sil.org/iso639-3/download.asp
>
> Cheers,
> Craig Franklin
>
> On 2 June 2016 at 08:59, Greg Lindahl <[log in to unmask]> wrote:
>
> > Some of the Internet Archive's library partners are asking us about
> > language metadata for regional languages that don't have standard
> > codes.  Is there a standard way of dealing with this situation?
> >
> > Overall we use MARC codes https://www.loc.gov/marc/languages/ which
> > were last updated in 2007. LOC also maintains ISO639-2
> > https://www.loc.gov/standards/iso639-2/php/code_list.php last updated
> > in 2014.
> >
> > The languages in question are regional languages which are currently
> > lumped together in both standards. With the recent rise in interest
> > and funding for regional languages, it's no surprise that some
> > catalogers want to split these languages out into separate codes.
> >
> > Thanks!
> >
> > -- greg
> >
>