Registration is now open for the next Open Preservation Foundation (OPF) webinar. 

OCR improvements through machine learning methods and the impact on the long term preservation of digitized content

Thursday 11 November at 15:00 CET | 14:00 GMT


Roxana Maurer and Ralph Marschall, Bibliothèque nationale du Luxembourg


The National Library of Luxembourg (Bibliothèque nationale du Luxembourg) has been digitizing its national heritage collections since the early 2000’s. After a few years of image-only digitization projects, the library switched to a METS/ALTO output with multiple manifestations, gaining with the years a great expertise in creating digitized content enriched with both Optical Character Recognition (OCR) and Optical Layout Recognition (OLR). In 2020 the eLmA (eLuxemburgensia meets AI) project was born: correcting the full-text (ALTO files) of more than 6,000,000 articles on the site. These articles have a varying quality for their OCR text, due to one or more reasons: the language of the text in which the text is written (German and French, to a lesser extent in Luxembourgish and English), the typography used (Gothic or Latin characters) or the quality of the digitization. This presentation will have a more in-depth look at the eLmA project, as well as its impact on the digital preservation of METS/ALTO content.


Register to reserve your place

To keep up to date with the latest OPF webinars and news, sign up to our mailing list.

Best wishes,
Charlotte Armstrong | Project Officer | Open Preservation Foundation | Twitter: @openpreserve

Please note: My OPF working days are Thursday and Friday. Please bear with me if I don't reply to your email right away. 

to manage your NDSA-ALL subscription, visit