On 27.01.15 17:40, Stefano Bargioni wrote:
> Hi, I'd like to generate a ALTO xml file [*] starting from a PDF file, like an ebook.
> Is there a tool to do this in a Unix/Linux machine?
I have more or less the opposite problem: I'd like to combine a bitmap
image and an ALTO file into a PDF document with searchable text. I think
I have found the necessary Python packages to build the desired PDF (as
per
http://stackoverflow.com/questions/1180115/add-text-to-existing-pdf-using-python
), and parsing the ALTO xml to get the text elements and their positions
on the page is certainly feasible. However I'd rather skip the mandatory
debugging step in the development process and use well-tested tools if I
can find them :-)
Any pointers someone would like to share?
Best regards,
Alain Borel
EPFL Bibliothèque
Rolex Learning Center
Station 20
CH-1015 LAUSANNE (SUISSE)
Téléphone: +41 (0)21 693.98.01
Téléfax: +41 (0)21 693.51.00
|