LISTSERV 16.5 - CODE4LIB Archives

When I've tried to do this, it's been much harder than your story, I'm 
afraid.

My library data is very inconsistent in the way it expresses it's 
holdings. Even _without_ "missing" items, the holdings are expressed in 
human-readable narrative form which is very difficult to parse reliably.

Theoretically, the holdings are expressed according to, I forget the 
name of the Z. standard, but some standard for expressing human readable 
holdings with certain punctuation and such. Even if they really WERE all 
exactly according to this standard, this standard is not very easy to 
parse consistently and reliably. But in fact, since when these tags are 
entered nothing validates them to this standard -- and at different 
times in history the cataloging staff entering them in various libraries 
had various ideas about how strictly they should follow this local 
"policy" -- our holdings are not even reliably according to that standard.

But if you think it's easy, please, give it a try and get back to us. :) 
Maybe your library's data is cleaner than mine.

I think it's kind of a crime that our ILS (and many other ILSs) doesn't 
provide a way for holdings to be efficiency entered (or guessed from 
prediction patterns etc) AND converted to an internal structured format 
that actually contains the semantic info we want. Offering catalogers 
the option to manually enter an MFHD is not a solution.

Jonathan

Kyle Banerjee wrote:
>>> The trick here is that traditional library metadata practices make it
>>>       
>> _very
>>     
>>> hard_ to tell if a _specific volume/issue_ is held by a given library.
>>>       
>>  And
>>     
>>> those are the most common use cases for OpenURL.
>>>
>>>       
>> Yep. That's true even for individual library's with link resolvers. OCLC is
>> not going to be able to solve that particular issue until the local
>> libraries do.
>>
>>     
>
> This might not be as bad as people think. The normal argument is that
> holdings are in free text and there's no way staff will ever have enough
> time to record volume level holdings. However, significant chunks of the
> problem can be addressed using relatively simple methods.
>
> For example, if you can identify complete runs, you know that a library has
> all holdings and can start automating things.
>
> With this in mind, the first step is to identify incomplete holdings. The
> mere presence of lingo like "missing," "lost," "incomplete," "scattered,"
> "wanting," etc. is a dead giveaway.  So are bracketed fields that contain
> enumeration or temporal data (though you'll get false hits using this method
> when catalogers supply enumeration). Commas in any field that contains
> enumeration or temporal data also indicate incomplete holdings.
>
> I suspect that the mere presence of a note is a great indicator that
> holdings are incomplete since what kind of yutz writes a note saying "all
> the holdings are here just like you'd expect?" Having said that, I need to
> crawl through a lot more data before being comfortable with that statement.
>
> Regexp matches can be used to search for closed date ranges in open serials
> or close dates within 866 that don't correspond to close dates within fixed
> fields.
>
> That's the first pass. The second pass would be to search for the most
> common patterns that occur within incomplete holdings. Wash, rinse, repeat.
> After awhile, you'll get to all the cornball schemes that don't lend
> themselves towards automation, but hopefully that group of materials is
> getting to a more manageable size where throwing labor at the metadata makes
> some sense. Possibly guessing if a volume is available based on timeframe is
> a good way to go.
>
> Worst case scenario if the program can't handle it is you deflect the
> request to the next institution, and that already happens all the time for a
> variety of reasons.
>
> While my comments are mostly concerned with journal holdings, similar logic
> can be used with monographic series as well.
>
> kyle
>
>