On Mon, 2004-02-09 at 11:41, Walter Lewis wrote: > One of the issues that I bumped into was that was passes for HTML in > some email programs is [insert expletive of choice here]. Putting it in > an XML data store was going to cause a tons of validation errors. Some success might be found with TagSoup: http://home.ccil.org/~cowan/XML/tagsoup/ It delivers SAX events from less than well-formed HTML. It doesn't correct validation or style problems though... just provides a consistent, well-formed interface to sloppy HTML. An alternate approach, JTidy will do a good job of fixing many validation problems, but it may fail depending on how bad the HTML is http://jtidy.sourceforge.net/ TagSoup doesn't fail... "Just Keep[s] On Truckin'" -- Kevin S. Clarke <[log in to unmask]> Lane Medical Library, Stanford University