Print

Print


On Mon, 2004-02-09 at 11:41, Walter Lewis wrote:

>     One of the issues that I bumped into was that was passes for HTML in
> some email programs is [insert expletive of choice here].  Putting it in
> an XML data store was going to cause a tons of validation errors.

Some success might be found with TagSoup:
http://home.ccil.org/~cowan/XML/tagsoup/

It delivers SAX events from less than well-formed HTML.  It doesn't
correct validation or style problems though...  just provides a
consistent, well-formed interface to sloppy HTML.

An alternate approach, JTidy will do a good job of fixing many
validation problems, but it may fail depending on how bad the HTML is

http://jtidy.sourceforge.net/

TagSoup doesn't fail... "Just Keep[s] On Truckin'"

--
Kevin S. Clarke <[log in to unmask]>
Lane Medical Library, Stanford University