Thank you to all who responded with software suggestions. https://github.com/ubleipzig/marctools is looking like the most promising candidate so far. The more I read through the recommendations the more it dawned on me that I don't want to have to configure yet another java toolchain (yes I know, that may be personal bias).
Thank you to all who responded about the challenges of authority control in such collections. I'm aware of these issues. The current project is about marshalling resources for editors to make informed decisions about rather than automating the creation of articles, because there is human judgement involved in the last step I can afford to take a few authority control 'risks'
I have a new phone number: 04 463 5692
From: Code for Libraries <[log in to unmask]> on behalf of raffaele messuti <[log in to unmask]>
Sent: Monday, 3 November 2014 11:39 p.m.
To: [log in to unmask]
Subject: Re: [CODE4LIB] MARC reporting engine
Stuart Yeates wrote:
> Do any of these have built-in indexing? 800k records isn't going to fit in memory and if building my own MARC indexer is 'relatively straightforward' then you're a better coder than I am.
you could try marcdb from marctools