You don't have to build your own indexer. You might use the pymarc
parser to pull the records into a flat database like Mongo, then
pull reports from there. It really depends on what the service is
delivering.
This would be much less insanity inducing than regexes in vi.
I do agree with Jonathan. If authorities were easy, everyone would be doing
them.
Cary
On Sunday, November 2, 2014, Stuart Yeates <[log in to unmask]> wrote:
> Do any of these have built-in indexing? 800k records isn't going to fit in
> memory and if building my own MARC indexer is 'relatively straightforward'
> then you're a better coder than I am.
>
> cheers
> stuart
>
> --
> I have a new phone number: 04 463 5692
>
> ________________________________________
> From: Code for Libraries <[log in to unmask]> on behalf of Jonathan
> Rochkind <[log in to unmask]>
> Sent: Monday, 3 November 2014 1:24 p.m.
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] MARC reporting engine
>
> If you are, can become, or know, a programmer, that would be relatively
> straightforward in any programming language using the open source MARC
> processing library for that language. (ruby marc, pymarc, perl marc,
> whatever).
>
> Although you might find more trouble than you expect around authorities,
> with them being less standardized in your corpus than you might like.
> ________________________________________
> From: Code for Libraries [[log in to unmask]] on behalf of Stuart
> Yeates [[log in to unmask]]
> Sent: Sunday, November 02, 2014 5:48 PM
> To: [log in to unmask]
> Subject: [CODE4LIB] MARC reporting engine
>
> I have ~800,000 MARC records from an indexing service (
> http://natlib.govt.nz/about-us/open-data/innz-metadata CC-BY). I am
> trying to generate:
>
> (a) a list of person authorities (and sundry metadata), sorted by how many
> times they're referenced, in wikimedia syntax
>
> (b) a view of a person authority, with all the records by which they're
> referenced, processed into a wikipedia stub biography
>
> I have established that this is too much data to process in XSLT or
> multi-line regexps in vi. What other MARC engines are there out there?
>
> The two options I'm aware of are learning multi-line processing in sed or
> learning enough koha to write reports in whatever their reporting engine is.
>
> Any advice?
>
> cheers
> stuart
> --
> I have a new phone number: 04 463 5692
>
--
Cary Gordon
The Cherry Hill Company
http://chillco.com
|