Anthologize's ePub output is decidedly plain.  It does do TOCs; it
probably won't fix all of your bad HTML.

It also operates pretty differently from your script in that it doesn't
take the formatted HTML as the starting point.  In the output process
Anthologize dumps the Anthologize project structure you create in its
editor from the WordPress db into a TEI(ish) document -- a container for
the metadata and post content -- which is then used as the basis for
generating output (ePub, PDF, HTML).  Anthologize also has a means for
creating and registering new output formats from the TEI, so you can
create new output formats, and user-specified options, to meet your
particular needs.

-- Scott

On 1/4/11 11:30 AM, "Louis St-Amour" <[log in to unmask]> wrote:

>Given my journal2epub script's experience with the Code4Lib journal
>site, does Anthologize have an option to produce TOC items from post
>headings, modifying the HTML to add IDs where necessary? Does it map
>links to posts with their offline copies, preserving references? Does
>it try to add the largest image it can, or does it include only
>embedded, potentially smaller ones? (In iBooks, unlike Adobe-based
>readers, you can double-tap to zoom in on an automatically resized
>large image.) Are metadata and stylesheets specified manually? And
>finally, does it clean up the HTML to produce strict XHTML 1.1 as
>required? In the journal's case, I had to process HTML three times
>with manual checks to delete invalid attributes before things would
>mostly validate. (Turns out validation is the hardest thing about
>automatically producing EPUB files.) As to my script's use in
>producing official EPUB files, sure, that's why I made it. But if you
>look closely, it makes assumptions about the HTML structure of the
>pages, so it might need modifications if the design or templates
>Sent from my iPhone
>On 2011-01-04, at 12:01 PM, "Hanrath, Scott" <[log in to unmask]> wrote:
>> Anthologize lets you be as picky as you like about the content you use
>> with it.  Essentially you create multiple Anthologize 'projects', then
>> the whatever subset of content you need (native local WordPress content
>> content imported via a feed) to the project.  The Anthologize content is
>> added as copies, preserving the originals and allowing for editing
>> specific to your output needs.
>> Eric's right that it *is* manual and a bit tedious, but it's (hopefully)
>> getting less so. You do need to created a 'part' structure within your
>> project to organize your content.  But when adding content you can
>> by Tag/Category/Date Range/Post Type.  And with the last release you can
>> add more than one post at a time.
>> The Anthologize dev team would certainly be interested in the code4lib
>> journal committee's take on  the tool and ways it could be improved.
>> (Support for automated project creation and output generation would an
>> interesting feature to see on the roadmap).
>> -- Scott
>> On 1/4/11 10:45 AM, "Eric Lease Morgan" <[log in to unmask]> wrote:
>>> On Jan 4, 2011, at 11:40 AM, Jonathan Rochkind wrote:
>>>> ...Is there any easy way to get it to, for instance, make an anthology
>>>> of
>>>> all the posts with a certain WordPress tag or category instead?...
>>> Based on my (poor) recollection of playing with the Anthologize
>>> the process is a bit manual. Initialize epub. Drag postings to it.
>>> Annotate/tweak titles. Click 'Go'. Get epub file. The process is not
>>> laborious, just a bit tedious. I would definitely recommend the
>>> Committee" experiment with Anthologize.
>>> --
>>> Eric Morgan