It looks like the dataset is available in XML format. Perhaps you can
import it into an XML database (eXist - exist-db.org comes to mind), and
then generate a report via its query capabilities.
Miles Fidelman
Jonathan Rochkind wrote:
> If you are, can become, or know, a programmer, that would be relatively straightforward in any programming language using the open source MARC processing library for that language. (ruby marc, pymarc, perl marc, whatever).
>
> Although you might find more trouble than you expect around authorities, with them being less standardized in your corpus than you might like.
> ________________________________________
> From: Code for Libraries [[log in to unmask]] on behalf of Stuart Yeates [[log in to unmask]]
> Sent: Sunday, November 02, 2014 5:48 PM
> To: [log in to unmask]
> Subject: [CODE4LIB] MARC reporting engine
>
> I have ~800,000 MARC records from an indexing service (http://natlib.govt.nz/about-us/open-data/innz-metadata CC-BY). I am trying to generate:
>
> (a) a list of person authorities (and sundry metadata), sorted by how many times they're referenced, in wikimedia syntax
>
> (b) a view of a person authority, with all the records by which they're referenced, processed into a wikipedia stub biography
>
> I have established that this is too much data to process in XSLT or multi-line regexps in vi. What other MARC engines are there out there?
>
> The two options I'm aware of are learning multi-line processing in sed or learning enough koha to write reports in whatever their reporting engine is.
>
> Any advice?
>
> cheers
> stuart
> --
> I have a new phone number: 04 463 5692
--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra
|