Are these in TEI? Back when I worked for the University of Virginia
Library, I did a lot of clean up work and migration of Chadwyck-Healey
stuff into TEI-P4 compliant XML (thousands of files), but unfortunately all
of the Perl scripts to migrate old garbage SGML into XML are probably gone.
How many of these things are really worth keeping, i.e., were not digitized
by any other organization that has freely published them online?
On Fri, Jun 5, 2015 at 8:10 AM, Eric Lease Morgan <[log in to unmask]> wrote:
> Does anybody here have experience reading the SGML/XML files representing
> the content of EEBO?
>
> I’ve gotten my hands on approximately 24 GB of SGML/XML files representing
> the content of EEBO (Early English Books Online). This data does not
> include page images. Instead it includes metadata of various ilks as well
> as the transcribed full text. I desire to reverse engineer the SGML/XML in
> order to: 1) provide an alternative search/browse interface to the
> collection, and 2) support various types of text mining services.
>
> While I am making progress against the data, it would be nice to learn of
> other people’s experience so I do not not re-invent the wheel (too many
> times). ‘Got ideas?
>
> —
> Eric Lease Morgan
> University Of Notre Dame
>
|