Do any of these have built-in indexing? 800k records isn't going to fit in memory and if building my own MARC indexer is 'relatively straightforward' then you're a better coder than I am.
cheers
stuart
--
I have a new phone number: 04 463 5692
________________________________________
From: Code for Libraries <[log in to unmask]> on behalf of Jonathan Rochkind <[log in to unmask]>
Sent: Monday, 3 November 2014 1:24 p.m.
To: [log in to unmask]
Subject: Re: [CODE4LIB] MARC reporting engine
If you are, can become, or know, a programmer, that would be relatively straightforward in any programming language using the open source MARC processing library for that language. (ruby marc, pymarc, perl marc, whatever).
Although you might find more trouble than you expect around authorities, with them being less standardized in your corpus than you might like.
________________________________________
From: Code for Libraries [[log in to unmask]] on behalf of Stuart Yeates [[log in to unmask]]
Sent: Sunday, November 02, 2014 5:48 PM
To: [log in to unmask]
Subject: [CODE4LIB] MARC reporting engine
I have ~800,000 MARC records from an indexing service (http://natlib.govt.nz/about-us/open-data/innz-metadata CC-BY). I am trying to generate:
(a) a list of person authorities (and sundry metadata), sorted by how many times they're referenced, in wikimedia syntax
(b) a view of a person authority, with all the records by which they're referenced, processed into a wikipedia stub biography
I have established that this is too much data to process in XSLT or multi-line regexps in vi. What other MARC engines are there out there?
The two options I'm aware of are learning multi-line processing in sed or learning enough koha to write reports in whatever their reporting engine is.
Any advice?
cheers
stuart
--
I have a new phone number: 04 463 5692
|