I would agree that communication problems are rampant. In this narrower conversation, though, I wonder if in terms of translation maybe there are ways to frame cataloging concepts in computer science terms. For example, periodically there will be a post on Autocat about some website or technological product that has discovered the problem of controlling names and is trying to implement some sort of authority control. The reaction tends to be along the lines of look at them reinventing the wheel or why didn't they ask us. This is probably not an entirely accurate assessment, but library science has built up a lot of experience in dealing with these problems that can be informative. There is a definite overlap in problem space and it would be good to get people to think of those connections. I'm not quite sure how to do this, but it seems possible. As someone else pointed out, the library world solutions tend to reflect the technology of the age when they were implemented so that context is often useful. For example, traditional library cataloging uses a so-called "undifferentiated heading" for names where there is a single record/ID (for example http://id.loc.gov/authorities/names/n79080965.html) when it can't be determined whether more than one instance of a given name string on different books [etc.] represents the same person or no information is known that would allow you to distinguish the two names in a way that works in the alphabetical system that names are filed in library catalogs. So in a card catalog, you could subdivide people with the same name mainly by arranging them by their birth and/or death dates or by their middle names like this: Smith, Jane Smith, Jane, 1770-1845 Smith, Jane, b. 1805 [note that computers don't easily sort these dates the way a human filer was supposed to] Smith, Jane, 1912- Smith, Jane A. Smith, Jane (Jane Alice) You can see the advantages of limiting the way you qualify names to a few options to try to improve the predictability for users of a card catalog (although it doesn't completely succeed). You can also see why however many indistinguishable "Smith, Jane"'s you had in a card catalog, a practical solution was to smush them all together and interfile the titles of their works. This was carried over to online authority records and is still the approach used in current cataloging rules, although RDA gives you more options to distinguish names. This contrasts with the approach taken by IMDb where all the instances of a given name are considered to represent separate persons until proven otherwise. This works because they manage their entities by identifiers and also because their method of distinguishing names for display is arbitrary (roman numerals, such as "John Smith (XVIII)"). Roman numerals won't scale, but there are other approaches for generating display forms of names that could work with the principle of separate until proven same. Other random things that might be useful to demystify: uniform titles, main entry, specificity of subject headings, ISBD punctuation, those subject headings "created for validation purposes," chief/prescribed source. Going the other way, I often encounter catalogers who don't have a good sense of what is possible or easy to do with computers. For example, it was suggested today on the OLAC list that wouldn't it be better if catalogers could just go back to using abbreviations (ill., p.) instead of spelling things out like RDA mandates (illustration(s), page(s)), which is indeed a lot more letter and a lot more possibilities for typos. Then the public display could just be programmatically set to show the spelled out version. If you start to think about what it would actually take for a computer to do this, especially over a set of data in the wild, it starts to look not so simple. 1. You need a complete, current list of fields and subfields to ignore (transcribed areas that are supposed to reflect verbatim what's in the source, headings--you really don't want to change Johnson, P. into Johnson, Pages 2. You have to avoid quoted text in notes, which is also supposed to be verbatim, but you do have to fix the text outside the quotes. If someone drops a quote mark, good luck 500 "Written by P. Smith"--p. 3. 3. For some text outside of quotes in notes, it might be hard to tell when something is or isn't an abbreviation 520 James P. Anderson read a 10,000 p. horror novel and became mentally ill. [all right, it's a silly example, but it makes the point] 500 1990 S/V100 P. [some types of odd identifiers like this are put in general notes with no quote marks] 4. You'd have to have some logic to tell the computer how to choose between page or pages for p. 300 $a ix, 155, 127, x p. 300 $a 300, [1] p. [most people wouldn't do this, but it's technically allowed] 300 $a A-Z p. 300 $a p. 713-797 300 $$a xxiv, 179 + p. [all except the second are straight out of AACR2] That's quite a few examples to account for. I have no idea how a computer would know whether ill. ought to map to illustration or illustrations in most cases since the distinction was not recorded. Perhaps illustration(s) would work. That doesn't even start to address mistakes in data, allowing for older rules (AACR1's illus.), non-English language records or local practices that go against the rules. All this is not to say that there isn't a real need here. Their ought to be a way to both minimize the amount of typing that catalogers have to do while at the same time provide full, unambiguous displays for users. So what I wish is that there were some way to get more catalogers to see that despite Watson, there are serious limitations to what computers can practically do and that we would be better off if we worked with computer's strengths instead of trying to make them do things that are hard for them to do so we can reproduce the form of the card catalog (as opposed to the function). Kelley On Fri, Nov 18, 2011 at 8:26 AM, Bohyun Kim <[log in to unmask]> wrote: > As a side note to this, the communication issue is not unique between catalogers and coders. It is a common discussion topic (librarians vs. IT; emerging technology librarians vs. library coders; even web designers vs. web developers). I hear about this a lot in library conferences. But of course, discussion there is mostly from the librarians' point of view. Since code4lib is unique in that many library coders get together, it would be good to hear the thoughts on this from the coders' point of view as well. > > ~Bohyun > > -----Original Message----- > From: Code for Libraries [mailto:[log in to unmask]] On Behalf > Of Kelley McGrath > Sent: Thursday, November 17, 2011 7:19 PM > To: [log in to unmask] > Subject: Re: [CODE4LIB] Cataloging4Coders @ C4L12 - We need your > brains > > I am not by any stretch of the imagination a coder, but I think it would be helpful to have some discussion of common cataloger-coder communication issues. So many cataloger-coder discussions online seem to consist of people talking past each other (although I do think there is a much larger and less vocal common ground in the middle). In addition, I have sometimes seen my cataloger and coder/IT colleagues struggle to communicate with each other and find myself trying to translate. Are there ways to make that translation process easier or cultivate more translators? What do coders wish that catalogers knew about how computers interact with metadata? > > I would also be interested in ideas on how to shift the conversation more towards underlying functionality. A central failing of computerized catalogs IMO is that they tend to replicate the literal form and actions of cards and the card catalog rather than tried to find a way to express the underlying functionality of the card catalog in a computer environment. This is also sometimes badly done because the programmers don't understand the point of what they're replicating (although to be fair, what they're trying to work with is often not in a form optimized for a computer environment). Uniform titles in many catalogs are a good example of this. > > Kelley > > PS Some of the other emails mention wanting help with understanding where real data differs from what's in specifications or differs over time or for other reasons. Speaking as a reasonably competent cataloger, I would say that, although some things can be anticipated in advance, I find this to inevitably be an iterative process. > > PPS I'm looking forward to attending. > > On Thu, Nov 10, 2011 at 11:14 AM, Becky Yoose <[log in to unmask]> wrote: >> Hey folks, >> >> There's been increasing discussion and interest about cataloging >> around this community (and others like it) for quite a while. I found >> some co-conspirators and we are planning to propose a pre-conference >> on cataloging/library metadata creation geared towards the huddled >> code4lib masses (otherwise known as coders) who are yearning for >> knowledge of this Darkest of Library Arts. >> >> We need you help before we post our proposal. We realize that there's >> a wide range of cataloging knowledge and experience in the community, >> and we want to make sure that those interested get the most out of >> the pre-conference. If this pre-conference has perked your interest, >> can you help us in letting us know: >> >> - What experience do you have with cataloging/library metadata creation? >> - What do you want us to cover? Do you have any questions that you >> want covered? >> >> This information will help us greatly in how we structure the >> pre-conference both in content and schedule. For now, we're planning >> a half-day pre-conference, but if there's enough interest between >> beginners and more experienced folks, we will consider offering two >> half-day preconferences in order to focus on specific participant needs. >> >> Feel free to ask questions as well - I'll try to answer them as best >> as possible given what our group has brainstormed so far. >> >> Thanks for reading, >> Becky >> Official cat[aloger] herder >> >> >> --------------------------------------- >> Becky Yoose >> Systems Librarian >> Grinnell College Libraries >> [log in to unmask] >> >