The approach described by Peter is also how I have been thinking about this. If the content is only available in HTML, it's hard to beat Tidy for doing a passable job of getting content into XHTML, and from there, stylesheets can work with leverage whatever structure is available, such as it is, and subject to the problems that Peter flagged. One other building block that might be useful in this context is the Composite Capabilities / Preferences Profile (CC/PP), see <http://www.webstandards.org/learn/askw3c/feb2004.html>. One section of this document states: "XHTML is powerful because it is XML, or so we've been taught. And the power of XML is often demonstrated through the use of XSLT, the transformation language for XML. Combining the possibility to transform XHTML content through XSLT with the flexibility and accuracy provided by CC/PP makes it possible to transform hypertext content on-the-fly beyond what style sheets already allow. You can show tabular content in a linear fashion for agents that can't handle tables, transform a long XHTML document with many sections in an SVG slideshow and so on, with very few limitations." Maybe CC/PP could be used to profile the layout needed to better expose content to harvesters and other applications. art