LISTSERV 16.5 - CODE4LIB Archives

I think there's a fundamental difference between MODS and DCTERMS that 
make this nearly impossible. I've sometimes described this as the 
difference between "metadata as record format" (MARC, oai_dc, MODS, etc) 
and "metadata as vocabulary" (DCTERMS, DCAM, & RDF Vocabs in general).

These aren't incompatible, but the semweb and dc communities haven't 
quite figured out how to define a record format from an unending 
collection of vocabulary elements. DC Application and Description Set 
Profiles and the W3C's Named Graphs are, IMO, steps in this direction.

"Converting" MODS to DCTERMS doesn't make much sense in this context 
though. MARC/MODS are record formats, so their metadata comes in the 
form of complete sentences. The trouble is that the grammar for those 
sentences is either: free text people-grammar -or- derived from many 
different sources (code-lists, ISBD, MARC, AACR#, and now RDA). 
Translating this by taking the words and mapping them to a set of terms 
without thinking about the grammar gives results like running text 
through babblefish. It might be readable, to an extent, but it's 
certainly not round-trip-able and it may or may not make sense.

Just building a record format from the full catalog of DCTERMS doesn't 
make much sense either. DCTERMS is just a list of words that DC thinks 
might be useful in resource descriptions. That's part of the reason DC 
never made an oai_dcq. Without an application's context, it would be of 
little value. Plus, DCTERMS is added to from time to time, and even the 
namespace isn't the full set of DCAM compatible properties. For example, 
DC endorses the MARC Relator Terms defined as RDF properties: 
http://dublincore.org/usage/documents/relators/

More importantly, the folks who are looking for a set of classes and 
properties for RDF-ized library metadata are turning to many places 
beyond the DCTERMS namespace: FOAF, Bibliontology, SKOS, and many 
others, including growing their own.

RDF's grammar comes from the RDF Data Model, and DC's comes from DCAM as 
well as directly from RDF. The process that Karen Coyle describes is 
really the only way forward in making a good faith effort to "put" MARC 
(the bibliographic data) onto the Semantic Web.

Best,
-Corey

MJ Suhonos wrote:
>> So, back to my statement, let me re-state it as:
>>
>> "dcterms is so terribly lossy that it would be a shame to reduce MARC21 bib data to it."
> 
> Right � sorry, I think I did understand your original point as meaning this, but both you and Eric reiterate a fine point about the endless confusion between MARC-as-data-format and MARC-as-semantic-model.
> 
> I still stand by my point, though, in asking *why* it's a shame to reduce it (and introduce loss).  Let me try to clarify below.
> 
>> Some of those elements may seem to be overkill ("Alternative Chronological Designation of First Issue or Part of Sequence"), but the fact is that someone somewhere in cataloger land has found a use for it.
> 
> Yes, even to me as a librarian but not a cataloguer, many (most?) of these elements seem like overkill.  I have no doubt there is an edge-case for having this fine level of descriptive detail, but I wonder:
> 
> a) what proportion of records have this level of description
> b) what kind of (or how much) user access justifies the effort in creating and preserving it
> 
>> My general rule has always been to retain the most detailed level of granularity that you can, because in indexing, display, etc. you can easily mush elements together, but once you've put them together it's devilish to get them back apart again.
> 
> Absolutely, and from that perspective I understand considering loss as something to be avoided at all costs.  But I also wonder what degree of diminishing returns we see in cataloguing practices that are reflected in our devotion to this rule, and thus our descriptive standards.
> 
> So, to clarify my intent further: I'm looking to come up with something that is based on the 80/20 (or perhaps 90/10) rule � that is, losing the top 10-20% of detail in exchange for a vastly simpler (and thus easier to work with) data model.  Isn't that what MODS and DCTERMS do, roughly?
> 
> Obviously this will not work as a canonical record that preserves all of the human effort that has gone into cataloguing, but if it makes that 80-90% of metadata (still a huge number) easily available, that seems like a huge step forward in terms of increasing access to me.
> 
>> The non-library computer types don't appreciate the value of human-aided systematic description.
> 
> I think I appreciate the value of human-based description pretty well.  My concern is that the "mind prison" that we attribute to MARC and its intricacies may actually be a symptom of catering to a million edge cases of "someone somewhere in cataloger land" rather than focusing on working for the bulk of use cases.
> 
> But I realize this now sounds like an NGC4lib thread, and for that I apologize.  :-)  So, to keep it pragmatic: it sounds to me that people think doing something as "basic" as getting millions of records out of binary MARC format into something as lossy and unrefined as DCTERMS to expose them isn't a worthwhile effort?
> 
> MJ
> 
> NB: When Karen Coyle, Eric Morgan, and Roy Tennant all reply to your thread within half an hour of each other, you know you've hit the big time.  Time to retire young I think.

-- 
Corey A Harper
Metadata Services Librarian
New York University Libraries
20 Cooper Square, 3rd Floor
New York, NY 10003-7112
212.998.2479
[log in to unmask]