LISTSERV 16.5 - CODE4LIB Archives

On Jun 5, 2015, at 8:20 AM, Ethan Gruber <[log in to unmask]> wrote:

>> Does anybody here have experience reading the SGML/XML files representing
>> the content of EEBO?
> 
> Are these in TEI? Back when I worked for the University of Virginia
> Library, I did a lot of clean up work and migration of Chadwyck-Healey
> stuff into TEI-P4 compliant XML (thousands of files), but unfortunately all
> of the Perl scripts to migrate old garbage SGML into XML are probably gone.
> 
> How many of these things are really worth keeping, i.e., were not digitized
> by any other organization that has freely published them online?


The data I have comes in two flavors: 1) some flavor of SGML, and 2) some flavor of XML which is TEI-like, but not TEI. All of the files are worth keeping because I get the basic bibliographic information (id, author, title, date, keywords/subjects), as well as transcribed text. (No images.) Given such data, I think I can provide interesting, cool, and “kewl” services. Given the id number, I may then be able to link to the scanned image. Wish me luck. —ELM