Hi Sara, What are you migrating to? It's generally easiest to start with what your new system expects and then figure out how to get it that. There's a good chance that the most straightforward approach will be to perform OCR as part of the ingest/migration process. kyle On Mon, Mar 25, 2019 at 10:12 AM Sara Amato <[log in to unmask]> wrote: > For those of you who have migrated out of CONTENTdm to another system, were > you able to migrate the OCR text coordinates? > > We have a locally hosted version of CONTENTdm, and are in the process of > planning for migration to another system. I'm trying to decipher how word > coordinates are determined for text highlighting. It looks like each > image has corresponding 'words.txt' file in the {collection}/supp/####/ > directory, with a format like : > > ounce 46475:27007:4532:647 > ounces 46311:31097:5515:647 45819:10539:5488:584 > pastry 39239:7137:5625:1085 > pennyweight 18868:29678:10267:855 > > I'm guessing that the #:#:#:# is coordinates, but I'm having a hard time > making that match up to the actual image (attached). Does anyone happen to > know what these numbers are and how to use them to determine word > coordinates? I've asked both OCLC and ABBYY, and they both say to ask the > other party :( >