I have managed to create marcrecord from epub files fairly easily :
a) unzip my_epub.epub content.opf
b) xsltproc DC2MARC21slim.xsl content.opf >my_epub.marcxml
And you get a fairly good MARC21slim bibliorecord. Thanks Library of
Congress for your wonderful xslts.
Some notes :
** EPUB files happen to contain my_epub.opf rather than content.opf,
but this is rather anecdotic. One could use
for i in *.epub
unzip $i *.opf -d `basename $i epub`
and get a list of directories containing the *.opf file.
** some opf data are not really what I would call the most proper
data in. You can get multiple authors or subjects respectively separated
by , or / combinated in one field. This could be takcled by some
additional XSLT process.
** For those unfortunate persons who are stuck with UNIMARC, well
there is hope for you. Some xslt to process the MARC21 into UNIMARC is
But I have some questions about CIP block.
In my opinion, this block is useful for printed books in order to shelve
it correctly, and get a LCCN.
But who is deciding the LCC or Dewey Classification code ? Should it be
the publisher's initiative ? Is there a way to get those information
And since it appears to me that CIP block could be processed from
marcxml information, should that block be stored as such, or pieces of
information collected as one is editing the e-book. Should we leave it
up to the publisher to create the CIP block just like printed book, when
it can't get through the process of having a partnership with Library of
Congress or is there some regular and quick way to get those information ?
Any hint librarians ?