For those of you who have migrated out of CONTENTdm to another system, were
you able to migrate the OCR text coordinates?
We have a locally hosted version of CONTENTdm, and are in the process of
planning for migration to another system. I'm trying to decipher how word
coordinates are determined for text highlighting. It looks like each
image has corresponding 'words.txt' file in the {collection}/supp/####/
directory, with a format like :
ounce 46475:27007:4532:647
ounces 46311:31097:5515:647 45819:10539:5488:584
pastry 39239:7137:5625:1085
pennyweight 18868:29678:10267:855
I'm guessing that the #:#:#:# is coordinates, but I'm having a hard time
making that match up to the actual image (attached). Does anyone happen to
know what these numbers are and how to use them to determine word
coordinates? I've asked both OCLC and ABBYY, and they both say to ask the
other party :(
|