Le 12 mai 2022 20:44:22 GMT+01:00, "Hammer, Erich F" <[log in to unmask]> a écrit :
>.DOCX files are just a collection of zipped xml and image files. You can see this by changing the extension (on a copy) on the file and then exploring. It should be possible to parse out the data from the XML file(s) and build a structure from it.
Yes, the key one is document.xml but it is very noisy and seems only
semantic if the author used styles instead of bold, italics and so on.