>
> Since you mentioned SimpleXML, Kyle, I assume you're using PHP?
>
Actually I'm using perl. For reasons not related to XML parsing, it is the
preferred (but not mandatory) language.
Based on a few tests and manual inspection, it looks like the ticket for me
is going have a two stage process where the first stage converts the file
to valid XML and the second cuts through it with SAX.
Originally, I was trying to avoid SAX, but the process has been prettier
than expected so far. The XML has not been prettier than expected --
it contains a number of issues including outright invalid XML, invalid
characters, and hand coded HTML within some elements (i.e. string data not
encoded as such). Gotta love library data. But screwed up stuff is
employment security. If things actually worked, I'd be redundant...
kyle
|