I've done RDF/DC to MARC for the Gutenberg Project. Requires a lot of
clean up especially with respect to subject heading strings since LCSH
might well appear in DC element but need to be parsed into marc subfields.
Tedious, human intervention required in the case of the Gutenberg Project.
Close to finishing the editing of about 4000 records harvested in late
December, 2014; about 16 months after an initial harvest of about 40,000.
The RDF/DC had changed somewhat but significantly fewer subject headings it
seemed. I decided to examine virtually every item and to find better
records at the Library of Congress or more frequently the Internet Archive
[ archive.org/details/texts ]
Fully agree how important it is but don't think I'll do it again since
consumes all my free time. Maybe if others could volunteer to do that, I
could continue harvesting. Only download of the complete collection is
possible but I use XSL to select records based on date added.
The collections you mention are worthy of being included in library
systems. Metadata quality is a limiting factor.
On Mon, Aug 18, 2014 at 5:04 PM, Stuart Yeates <[log in to unmask]>
> There are a stack of great free ebook repositories available on the web,
> things like https://unglue.it/ http://www.gutenberg.org/
> https://en.wikibooks.org/wiki/Main_Page http://www.gutenberg.net.au/
> https://www.smashwords.com/books/category/1/newest/0/free/any etc, etc
> What there doesn't appear to be, is high-quality AACR2 / RDA records
> available for these. There are things like https://ebooks.adelaide.edu.
> au/meta/pg/ which are elaborate dublin core to MARC converters, but these
> lack standardisation of names, authority control (people, entities, places,
> etc), interlinking, etc.
> It seems to me that quality metadata would greatly increase the value /
> findability / use of these projects and thus their visibility and available
> Are there any projects working in this space already? Are there suitable
> tools available?
Metadata and Bibliographic Services for Libraries