Print

Print


On 27.01.15 17:40, Stefano Bargioni wrote:
> Hi, I'd like to generate a ALTO xml file [*] starting from a PDF file, like an ebook.
> Is there a tool to do this in a Unix/Linux machine?

I have more or less the opposite problem: I'd like to combine a bitmap 
image and an ALTO file into a PDF document with searchable text. I think 
I have found the necessary Python packages to build the desired PDF (as 
per 
http://stackoverflow.com/questions/1180115/add-text-to-existing-pdf-using-python 
), and parsing the ALTO xml to get the text elements and their positions 
on the page is certainly feasible. However I'd rather skip the mandatory 
debugging step in the development process and use well-tested tools if I 
can find them :-)

Any pointers someone would like to share?

Best regards,
Alain Borel
EPFL Bibliothèque
Rolex Learning Center
Station 20
CH-1015  LAUSANNE (SUISSE)
Téléphone:	+41 (0)21 693.98.01
Téléfax:	+41 (0)21 693.51.00