LISTSERV mailing list manager LISTSERV 16.5

Help for CODE4LIB Archives


CODE4LIB Archives

CODE4LIB Archives


CODE4LIB@LISTS.CLIR.ORG


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CODE4LIB Home

CODE4LIB Home

CODE4LIB  November 2011

CODE4LIB November 2011

Subject:

Re: Cataloging4Coders @ C4L12 - We need your brains

From:

Kelley McGrath <[log in to unmask]>

Reply-To:

Code for Libraries <[log in to unmask]>

Date:

Fri, 18 Nov 2011 16:33:21 -0800

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (210 lines)

I would agree that communication problems are rampant. In this narrower
conversation, though, I wonder if in terms of translation maybe there are
ways to frame cataloging concepts in computer science terms.

For example, periodically there will be a post on Autocat about some website
or technological product that has discovered the problem of controlling
names and is trying to implement some sort of authority control. The
reaction tends to be along the lines of look at them reinventing the wheel
or why didn't they ask us. This is probably not an entirely accurate
assessment, but library science has built up a lot of experience in dealing
with these problems that can be informative. There is a definite overlap in
problem space and it would be good to get people to think of those
connections. I'm not quite sure how to do this, but it seems possible.

As someone else pointed out, the library world solutions tend to reflect the
technology of the age when they were implemented so that context is often
useful. For example, traditional library cataloging uses a so-called
"undifferentiated heading" for names where there is a single record/ID (for
example http://id.loc.gov/authorities/names/n79080965.html) when it can't be
determined whether more than one instance of a given name string on
different books [etc.] represents the same person or no information is known
that would allow you to distinguish the two names in a way that works in the
alphabetical system that names are filed in library catalogs. So in a card
catalog, you could subdivide people with the same name mainly by arranging
them by their birth and/or death dates or by their middle names like this:

Smith, Jane
Smith, Jane, 1770-1845
Smith, Jane, b. 1805 [note that computers don't easily sort these dates the
way a human filer was supposed to]
Smith, Jane, 1912-
Smith, Jane A.
Smith, Jane (Jane Alice)

You can see the advantages of limiting the way you qualify names to a few
options to try to improve the predictability for users of a card catalog
(although it doesn't completely succeed). You can also see why however many
indistinguishable "Smith, Jane"'s you had in a card catalog, a practical
solution was to smush them all together and interfile the titles of their
works. This was carried over to online authority records and is still the
approach used in current cataloging rules, although RDA gives you more
options to distinguish names.

This contrasts with the approach taken by IMDb where all the instances of a
given name are considered to represent separate persons until proven
otherwise. This works because they manage their entities by identifiers and
also because their method of distinguishing names for display is arbitrary
(roman numerals, such as "John Smith (XVIII)"). Roman numerals won't scale,
but there are other approaches for generating display forms of names that
could work with the principle of separate until proven same.

Other random things that might be useful to demystify: uniform titles, main
entry, specificity of subject headings, ISBD punctuation, those subject
headings "created for validation purposes," chief/prescribed source. 

Going the other way, I often encounter catalogers who don't have a good
sense of what is possible or easy to do with computers. For example, it was
suggested today on the OLAC list that wouldn't it be better if catalogers
could just go back to using abbreviations (ill., p.) instead of spelling
things out like RDA mandates (illustration(s), page(s)), which is indeed a
lot more letter and a lot more possibilities for typos. Then the public
display could just be programmatically set to show the spelled out version.

If you start to think about what it would actually take for a computer to do
this, especially over a set of data in the wild, it starts to look not so
simple.

1. You need a complete, current list of fields and subfields to ignore
(transcribed areas that are supposed to reflect verbatim what's in the
source, headings--you really don't want to change Johnson, P. into Johnson,
Pages

2. You have to avoid quoted text in notes, which is also supposed to be
verbatim, but you do have to fix the text outside the quotes. If someone
drops a quote mark, good luck

     500 "Written by P. Smith"--p. 3.

3. For some text outside of quotes in notes, it might be hard to tell when
something is or isn't an abbreviation

    520  James P. Anderson read a 10,000 p. horror novel and became mentally
ill.  [all right, it's a silly example, but it makes the point]
 
    500  1990 S/V100 P.  [some types of odd identifiers like this are put in
general notes with no quote marks]

4. You'd have to have some logic to tell the computer how to choose between
page or pages for p. 

  300 $a ix, 155, 127,  x p.
  300 $a 300, [1] p.  [most people wouldn't do this, but it's technically
allowed]
  300  $a A-Z p.
  300  $a p. 713-797
  300  $$a xxiv, 179 + p.

[all except the second are straight out of AACR2]
That's quite a few examples to account for.

I have no idea how a computer would know whether ill. ought to map to
illustration or illustrations in most cases since the distinction was not
recorded. Perhaps illustration(s) would work.

That doesn't even start to address mistakes in data, allowing for older
rules (AACR1's illus.), non-English language records or local practices that
go against the rules. All this is not to say that there isn't a real need
here. Their ought to be a way to both minimize the amount of typing that
catalogers have to do while at the same time provide full, unambiguous
displays for users.

So what I wish is that there were some way to get more catalogers to see
that despite Watson, there are serious limitations to what computers can
practically do and that we would be better off if we worked with computer's
strengths instead of trying to make them do things that are hard for them to
do so we can reproduce the form of the card catalog (as opposed to the
function).

Kelley

On Fri, Nov 18, 2011 at 8:26 AM, Bohyun Kim <[log in to unmask]> wrote:
> As a side note to this, the communication issue is not unique between
catalogers and coders. It is a common discussion topic (librarians vs. IT;
emerging technology librarians vs. library coders; even web designers vs.
web developers).  I hear about this a lot in library conferences. But of
course, discussion there is mostly from the librarians' point of view. Since
code4lib is unique in that many library coders get together, it would be
good to hear the thoughts on this from the coders' point of view as well.
>
> ~Bohyun
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf 
> Of Kelley McGrath
> Sent: Thursday, November 17, 2011 7:19 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Cataloging4Coders @ C4L12 - We need your 
> brains
>
> I am not by any stretch of the imagination a coder, but I think it would
be helpful to have some discussion of common cataloger-coder communication
issues. So many cataloger-coder discussions online seem to consist of people
talking past each other (although I do think there is a much larger and less
vocal common ground in the middle). In addition, I have sometimes seen my
cataloger and coder/IT colleagues struggle to communicate with each other
and find myself trying to translate. Are there ways to make that translation
process easier or cultivate more translators? What do coders wish that
catalogers knew about how computers interact with metadata?
>
> I would also be interested in ideas on how to shift the conversation more
towards underlying functionality. A central failing of computerized catalogs
IMO is that they tend to replicate the literal form and actions of cards and
the card catalog rather than tried to find a way to express the underlying
functionality of the card catalog in a computer environment. This is also
sometimes badly done because the programmers don't understand the point of
what they're replicating (although to be fair, what they're trying to work
with is often not in a form optimized for a computer environment). Uniform
titles in many catalogs are a good example of this.
>
> Kelley
>
> PS Some of the other emails mention wanting help with understanding where
real data differs from what's in specifications or differs over time or for
other reasons. Speaking as a reasonably competent cataloger, I would say
that, although some things can be anticipated in advance, I find this to
inevitably be an iterative process.
>
> PPS I'm looking forward to attending.
>
> On Thu, Nov 10, 2011 at 11:14 AM, Becky Yoose <[log in to unmask]> wrote:
>> Hey folks,
>>
>> There's been increasing discussion and interest about cataloging 
>> around this community (and others like it) for quite a while. I found 
>> some co-conspirators and we are planning to propose a pre-conference 
>> on cataloging/library metadata creation geared towards the huddled 
>> code4lib masses (otherwise known as coders) who are yearning for 
>> knowledge of this Darkest of Library Arts.
>>
>> We need you help before we post our proposal. We realize that there's 
>> a wide range of cataloging knowledge and experience in the community, 
>> and we want to make sure that those interested get the most out of 
>> the pre-conference. If this pre-conference has perked your interest, 
>> can you help us in letting us know:
>>
>> - What experience do you have with cataloging/library metadata creation?
>> - What do you want us to cover? Do you have any questions that you 
>> want covered?
>>
>> This information will help us greatly in how we structure the 
>> pre-conference both in content and schedule. For now, we're planning 
>> a half-day pre-conference, but if there's enough interest between 
>> beginners and more experienced folks, we will consider offering two 
>> half-day preconferences in order to focus on specific participant needs.
>>
>> Feel free to ask questions as well - I'll try to answer them as best 
>> as possible given what our group has brainstormed so far.
>>
>> Thanks for reading,
>> Becky
>> Official cat[aloger] herder
>>
>>
>> ---------------------------------------
>> Becky Yoose
>> Systems Librarian
>> Grinnell College Libraries
>> [log in to unmask]
>>
>

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003

ATOM RSS1 RSS2



LISTS.CLIR.ORG

CataList Email List Search Powered by the LISTSERV Email List Manager