LISTSERV 16.5 - CODE4LIB Archives

I would agree that communication problems are rampant. In this narrower
conversation, though, I wonder if in terms of translation maybe there are
ways to frame cataloging concepts in computer science terms.

For example, periodically there will be a post on Autocat about some website
or technological product that has discovered the problem of controlling
names and is trying to implement some sort of authority control. The
reaction tends to be along the lines of look at them reinventing the wheel
or why didn't they ask us. This is probably not an entirely accurate
assessment, but library science has built up a lot of experience in dealing
with these problems that can be informative. There is a definite overlap in
problem space and it would be good to get people to think of those
connections. I'm not quite sure how to do this, but it seems possible.

As someone else pointed out, the library world solutions tend to reflect the
technology of the age when they were implemented so that context is often
useful. For example, traditional library cataloging uses a so-called
"undifferentiated heading" for names where there is a single record/ID (for
example http://id.loc.gov/authorities/names/n79080965.html) when it can't be
determined whether more than one instance of a given name string on
different books [etc.] represents the same person or no information is known
that would allow you to distinguish the two names in a way that works in the
alphabetical system that names are filed in library catalogs. So in a card
catalog, you could subdivide people with the same name mainly by arranging
them by their birth and/or death dates or by their middle names like this:

Smith, Jane
Smith, Jane, 1770-1845
Smith, Jane, b. 1805 [note that computers don't easily sort these dates the
way a human filer was supposed to]
Smith, Jane, 1912-
Smith, Jane A.
Smith, Jane (Jane Alice)

You can see the advantages of limiting the way you qualify names to a few
options to try to improve the predictability for users of a card catalog
(although it doesn't completely succeed). You can also see why however many
indistinguishable "Smith, Jane"'s you had in a card catalog, a practical
solution was to smush them all together and interfile the titles of their
works. This was carried over to online authority records and is still the
approach used in current cataloging rules, although RDA gives you more
options to distinguish names.

This contrasts with the approach taken by IMDb where all the instances of a
given name are considered to represent separate persons until proven
otherwise. This works because they manage their entities by identifiers and
also because their method of distinguishing names for display is arbitrary
(roman numerals, such as "John Smith (XVIII)"). Roman numerals won't scale,
but there are other approaches for generating display forms of names that
could work with the principle of separate until proven same.

Other random things that might be useful to demystify: uniform titles, main
entry, specificity of subject headings, ISBD punctuation, those subject
headings "created for validation purposes," chief/prescribed source. 

Going the other way, I often encounter catalogers who don't have a good
sense of what is possible or easy to do with computers. For example, it was
suggested today on the OLAC list that wouldn't it be better if catalogers
could just go back to using abbreviations (ill., p.) instead of spelling
things out like RDA mandates (illustration(s), page(s)), which is indeed a
lot more letter and a lot more possibilities for typos. Then the public
display could just be programmatically set to show the spelled out version.

If you start to think about what it would actually take for a computer to do
this, especially over a set of data in the wild, it starts to look not so
simple.

1. You need a complete, current list of fields and subfields to ignore
(transcribed areas that are supposed to reflect verbatim what's in the
source, headings--you really don't want to change Johnson, P. into Johnson,
Pages

2. You have to avoid quoted text in notes, which is also supposed to be
verbatim, but you do have to fix the text outside the quotes. If someone
drops a quote mark, good luck

     500 "Written by P. Smith"--p. 3.

3. For some text outside of quotes in notes, it might be hard to tell when
something is or isn't an abbreviation

    520  James P. Anderson read a 10,000 p. horror novel and became mentally
ill.  [all right, it's a silly example, but it makes the point]
 
    500  1990 S/V100 P.  [some types of odd identifiers like this are put in
general notes with no quote marks]

4. You'd have to have some logic to tell the computer how to choose between
page or pages for p. 

  300 $a ix, 155, 127,  x p.
  300 $a 300, [1] p.  [most people wouldn't do this, but it's technically
allowed]
  300  $a A-Z p.
  300  $a p. 713-797
  300  $$a xxiv, 179 + p.

[all except the second are straight out of AACR2]
That's quite a few examples to account for.

I have no idea how a computer would know whether ill. ought to map to
illustration or illustrations in most cases since the distinction was not
recorded. Perhaps illustration(s) would work.

That doesn't even start to address mistakes in data, allowing for older
rules (AACR1's illus.), non-English language records or local practices that
go against the rules. All this is not to say that there isn't a real need
here. Their ought to be a way to both minimize the amount of typing that
catalogers have to do while at the same time provide full, unambiguous
displays for users.

So what I wish is that there were some way to get more catalogers to see
that despite Watson, there are serious limitations to what computers can
practically do and that we would be better off if we worked with computer's
strengths instead of trying to make them do things that are hard for them to
do so we can reproduce the form of the card catalog (as opposed to the
function).

Kelley

On Fri, Nov 18, 2011 at 8:26 AM, Bohyun Kim <[log in to unmask]> wrote:
> As a side note to this, the communication issue is not unique between
catalogers and coders. It is a common discussion topic (librarians vs. IT;
emerging technology librarians vs. library coders; even web designers vs.
web developers).  I hear about this a lot in library conferences. But of
course, discussion there is mostly from the librarians' point of view. Since
code4lib is unique in that many library coders get together, it would be
good to hear the thoughts on this from the coders' point of view as well.
>
> ~Bohyun
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf 
> Of Kelley McGrath
> Sent: Thursday, November 17, 2011 7:19 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Cataloging4Coders @ C4L12 - We need your 
> brains
>
> I am not by any stretch of the imagination a coder, but I think it would
be helpful to have some discussion of common cataloger-coder communication
issues. So many cataloger-coder discussions online seem to consist of people
talking past each other (although I do think there is a much larger and less
vocal common ground in the middle). In addition, I have sometimes seen my
cataloger and coder/IT colleagues struggle to communicate with each other
and find myself trying to translate. Are there ways to make that translation
process easier or cultivate more translators? What do coders wish that
catalogers knew about how computers interact with metadata?
>
> I would also be interested in ideas on how to shift the conversation more
towards underlying functionality. A central failing of computerized catalogs
IMO is that they tend to replicate the literal form and actions of cards and
the card catalog rather than tried to find a way to express the underlying
functionality of the card catalog in a computer environment. This is also
sometimes badly done because the programmers don't understand the point of
what they're replicating (although to be fair, what they're trying to work
with is often not in a form optimized for a computer environment). Uniform
titles in many catalogs are a good example of this.
>
> Kelley
>
> PS Some of the other emails mention wanting help with understanding where
real data differs from what's in specifications or differs over time or for
other reasons. Speaking as a reasonably competent cataloger, I would say
that, although some things can be anticipated in advance, I find this to
inevitably be an iterative process.
>
> PPS I'm looking forward to attending.
>
> On Thu, Nov 10, 2011 at 11:14 AM, Becky Yoose <[log in to unmask]> wrote:
>> Hey folks,
>>
>> There's been increasing discussion and interest about cataloging 
>> around this community (and others like it) for quite a while. I found 
>> some co-conspirators and we are planning to propose a pre-conference 
>> on cataloging/library metadata creation geared towards the huddled 
>> code4lib masses (otherwise known as coders) who are yearning for 
>> knowledge of this Darkest of Library Arts.
>>
>> We need you help before we post our proposal. We realize that there's 
>> a wide range of cataloging knowledge and experience in the community, 
>> and we want to make sure that those interested get the most out of 
>> the pre-conference. If this pre-conference has perked your interest, 
>> can you help us in letting us know:
>>
>> - What experience do you have with cataloging/library metadata creation?
>> - What do you want us to cover? Do you have any questions that you 
>> want covered?
>>
>> This information will help us greatly in how we structure the 
>> pre-conference both in content and schedule. For now, we're planning 
>> a half-day pre-conference, but if there's enough interest between 
>> beginners and more experienced folks, we will consider offering two 
>> half-day preconferences in order to focus on specific participant needs.
>>
>> Feel free to ask questions as well - I'll try to answer them as best 
>> as possible given what our group has brainstormed so far.
>>
>> Thanks for reading,
>> Becky
>> Official cat[aloger] herder
>>
>>
>> ---------------------------------------
>> Becky Yoose
>> Systems Librarian
>> Grinnell College Libraries
>> [log in to unmask]
>>
>