Hi Sara,
What are you migrating to? It's generally easiest to start with what your
new system expects and then figure out how to get it that.
There's a good chance that the most straightforward approach will be to
perform OCR as part of the ingest/migration process.
kyle
On Mon, Mar 25, 2019 at 10:12 AM Sara Amato <[log in to unmask]> wrote:
> For those of you who have migrated out of CONTENTdm to another system, were
> you able to migrate the OCR text coordinates?
>
> We have a locally hosted version of CONTENTdm, and are in the process of
> planning for migration to another system. I'm trying to decipher how word
> coordinates are determined for text highlighting. It looks like each
> image has corresponding 'words.txt' file in the {collection}/supp/####/
> directory, with a format like :
>
> ounce 46475:27007:4532:647
> ounces 46311:31097:5515:647 45819:10539:5488:584
> pastry 39239:7137:5625:1085
> pennyweight 18868:29678:10267:855
>
> I'm guessing that the #:#:#:# is coordinates, but I'm having a hard time
> making that match up to the actual image (attached). Does anyone happen to
> know what these numbers are and how to use them to determine word
> coordinates? I've asked both OCLC and ABBYY, and they both say to ask the
> other party :(
>
|