The recently released EEBO texts are available as TEI, I suggest you ask on
the TEI list.
If you want real vanilla htm like conversion, Tei-boilerplate is probably a
good place to start.
Cheers
Stuart
On Saturday, June 6, 2015, Eric Lease Morgan <[log in to unmask]> wrote:
> On Jun 5, 2015, at 8:20 AM, Ethan Gruber <[log in to unmask]
> <javascript:;>> wrote:
>
> >> Does anybody here have experience reading the SGML/XML files
> representing
> >> the content of EEBO?
> >
> > Are these in TEI? Back when I worked for the University of Virginia
> > Library, I did a lot of clean up work and migration of Chadwyck-Healey
> > stuff into TEI-P4 compliant XML (thousands of files), but unfortunately
> all
> > of the Perl scripts to migrate old garbage SGML into XML are probably
> gone.
> >
> > How many of these things are really worth keeping, i.e., were not
> digitized
> > by any other organization that has freely published them online?
>
>
> The data I have comes in two flavors: 1) some flavor of SGML, and 2) some
> flavor of XML which is TEI-like, but not TEI. All of the files are worth
> keeping because I get the basic bibliographic information (id, author,
> title, date, keywords/subjects), as well as transcribed text. (No images.)
> Given such data, I think I can provide interesting, cool, and “kewl”
> services. Given the id number, I may then be able to link to the scanned
> image. Wish me luck. —ELM
>
--
--
...let us be heard from red core to black sky
|