Hi Andy,
We are still working on the license terms before opening the project, but if you can give me your github account name I will send you a link.
Wariness is understandable -- this works on very dirty data.
Once you are added to the project you can look at our test cases in:
<https://github.com/cdlib/hparser/blob/master/src/test/resources/test_holdings.xml>
<https://github.com/cdlib/hparser/blob/master/src/test/resources/test_holdings.xml>
https://github.com/cdlib/hparser/blob/master/src/test/resources/test_holdings.xml
to see samples of the type of statements that it parses.
We haven't had a lot of time to put into this project, so our hope is to find collaborators who can help us move forward on improving it and adding some capabilities. We are currently using it production to find the earliest and last years held.
Joe
________________________________
From: Code for Libraries <[log in to unmask]> on behalf of Andy Kohler <[log in to unmask]>
Sent: Tuesday, October 10, 2017 10:26:28 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Interest in serial summary holdings parsing?
"Interest" might be less appropriate than "wariness"... but sure. My
institution (UCLA) is one which contributes holdings to these projects. I
haven't done any 866 parsing in a long time, but have written (perl) code
to fix 85x/86x sequencing problems in the LHRs we generate for OCLC, so I
know my way around our holdings records.
Andy Kohler / UCLA
On Mon, Oct 9, 2017 at 8:25 AM, Joe Ferrie <[log in to unmask]> wrote:
> We have some (Java) code at California Digital Library that parses serial
> summary holding statements derived from MARC holdings records, and are
> thinking of making it its own open source project. We would be interested
> in knowing if anyone would be interested in using the code or working on
> it, and whether you already have code that does similar parsing.
>
>
> Our main use cases for this are determining which institutions in our
> consortium hold a specific year of a serial title for purposes of
> consortial resource sharing, and also which institution holds the deepest
> backlog of a title in the WEST shared print program.
>
|