Hi Matt, I'm going to have someone on my end contact you directly, but for the other code4lib-ers out there who are interested: it's a simple question with a complicated answer. It depends on the corpus you have to work with. You also have to prep your files carefully. The leading solution is a program called Sakhr, but you have to spend time training it. Tesseract and Abbyy work, too, but their accuracy depends on a variety of factors. Best wishes, Carol On Fri, May 4, 2018 at 5:56 PM, Matt Sherman <[log in to unmask]> wrote: > Hi all, > > I was hoping someone could point me to some programs that might be > helpful. I am helping a scholar plan a large scale digitization of his > collection of Arabic books so he can work abroad and need to find out the > best way to scan and OCR them. While I know generally how to look into the > scanning of the books, though if anyone knows some good services that > aren't too expensive let me know, the bigger question is how well we can > OCR them. Does anyone have advice of how to run OCR on non-Roman character > texts? Particularly in this case in Arabic. Any insights would be helpful > as we put this plan together so can develop this project and its budget > appropriately. Thanks for any information you folks can provide. > > Matt Sherman > -- Carol Kassel Senior Manager, Digital Library Infrastructure NYU Digital Library Technology Services [log in to unmask] (212) 992-9246 dlib.nyu.edu