LISTSERV 16.5 - CODE4LIB Archives

> So, back to my statement, let me re-state it as:
> 
> "dcterms is so terribly lossy that it would be a shame to reduce MARC21 bib data to it."

Right — sorry, I think I did understand your original point as meaning this, but both you and Eric reiterate a fine point about the endless confusion between MARC-as-data-format and MARC-as-semantic-model.

I still stand by my point, though, in asking *why* it's a shame to reduce it (and introduce loss).  Let me try to clarify below.

> Some of those elements may seem to be overkill ("Alternative Chronological Designation of First Issue or Part of Sequence"), but the fact is that someone somewhere in cataloger land has found a use for it.

Yes, even to me as a librarian but not a cataloguer, many (most?) of these elements seem like overkill.  I have no doubt there is an edge-case for having this fine level of descriptive detail, but I wonder:

a) what proportion of records have this level of description
b) what kind of (or how much) user access justifies the effort in creating and preserving it

> My general rule has always been to retain the most detailed level of granularity that you can, because in indexing, display, etc. you can easily mush elements together, but once you've put them together it's devilish to get them back apart again.

Absolutely, and from that perspective I understand considering loss as something to be avoided at all costs.  But I also wonder what degree of diminishing returns we see in cataloguing practices that are reflected in our devotion to this rule, and thus our descriptive standards.

So, to clarify my intent further: I'm looking to come up with something that is based on the 80/20 (or perhaps 90/10) rule — that is, losing the top 10-20% of detail in exchange for a vastly simpler (and thus easier to work with) data model.  Isn't that what MODS and DCTERMS do, roughly?

Obviously this will not work as a canonical record that preserves all of the human effort that has gone into cataloguing, but if it makes that 80-90% of metadata (still a huge number) easily available, that seems like a huge step forward in terms of increasing access to me.

> The non-library computer types don't appreciate the value of human-aided systematic description.

I think I appreciate the value of human-based description pretty well.  My concern is that the "mind prison" that we attribute to MARC and its intricacies may actually be a symptom of catering to a million edge cases of "someone somewhere in cataloger land" rather than focusing on working for the bulk of use cases.

But I realize this now sounds like an NGC4lib thread, and for that I apologize.  :-)  So, to keep it pragmatic: it sounds to me that people think doing something as "basic" as getting millions of records out of binary MARC format into something as lossy and unrefined as DCTERMS to expose them isn't a worthwhile effort?

MJ

NB: When Karen Coyle, Eric Morgan, and Roy Tennant all reply to your thread within half an hour of each other, you know you've hit the big time.  Time to retire young I think.