On Mon, Apr 7, 2008 at 8:30 AM, Esha Datta <[log in to unmask]> wrote:
> NYU is looking at e-publishing in general and how it ties in with
>  preservation requirements. Have any of you done any work with PDF/A
>  and generating access files from that format? We have a number of
>  books that will be converted to the pdf format. We're looking at PDF/
>  A for ingestion into our preservation repository(a DSpace instance)
>  and generating access files from it. How easy/difficult was it to
>  generate a workflow for working with PDFs, generating PDF/As, and
>  access files from PDF/As.

One more vote for OJS, here -- we're running the Journal of Insect
Science[1] with it, very successfully.

With respect to generating HTML and PDFs, my understanding (it's a
little fuzzy) is that we have manuscripts converted to XML by a
third-party, and then use a combination of XSLT and Prince to generate
professional-quality documents. Prince isn't cheap, but man, if it
isn't good at what it does. If you were gonna start at this again, you
might be able to build a wrapper around Gecko or WebKit to do the
work... but that'd take time.

IIRC, it's all pretty cheap (dunno if I can disclose our XML
processing rate -- suffice to say, it's cheaper than undergrads), and
takes somewhere in the 3-4 hours per article timeframe. I think
there's a fairly good potential for economies of scale, were we to add
more titles.

A great person to talk to is Andrew Gough <[log in to unmask]> --
he developed most of the workflow and procedures we use at Madison
(I've copied him here, in case this message contains gross