LISTSERV 16.5 - CODE4LIB Archives

On Wed, Aug 12, 2009 at 10:48 AM, Karen Coyle<[log in to unmask]> wrote:
> Ross Singer wrote:
>>
>> 3) What, specifically, is missing from DCTerms that would make a MODS
>> ontology needed?  What, specifically, is missing from Bibliontology or
>> MusicOntology or FOAF or SKOS, etc. that justifies a new and, in many
>> places, overlapping vocabulary?  Would time be better spent trying to
>> improve the existing vocabularies?
>>
>
> MARC: 182 fields, 1711 subfields, 2401 fixed field values
> DC: 59 properties

I see where you're going with this, but I'm not sure it's a fair
critique.  It's sort of on par with saying that a Dodge Grand Caravan
is a more sophisticated vehicle than a Mini Cooper because it has more
horsepower, 3 times as many cup holders and vastly more cubic footage
in the interior.  A Caravan /may/ be a more sophisticated vehicle, but
I'm not sure a quick run over the specs can necessarily reveal that.

One of the problems here is that it doesn't begin to address the DCAM
-- these are 59 properties that can be reused among 22 classes, giving
them different semantic meaning.
>
> Look at the sample records in MARCXML and DC at
> http://www.loc.gov/standards/marcxml and you will see how lossy it is.

Now I think you know you're being a little misleading here.  For one
thing, it's using DC Elements and it's not doing /anything/ vaguely
RDF-related.  Unfortunately, I think it's examples like this that have
led libraries to write DC off as next to worthless (and
understandably!).

Dublin Core is toothless and practically worthless in XML form.  It is
considerably more powerful when used in RDF, however, because they
play to their mutual strengths, namely that in RDF, you generally
don't use a schema in isolation.

>Now,
> you could argue that no one needs all of the detail in MARC, and I'm sure it
> could be reduced down to something more rational, plus there is redundancy
> in it, but for pity's sake, DC doesn't have a way to indicate the EDITION of
> a work.

This is true.  But this is also why I'm asking what is missing in
DCTerms that would be available in MODS -- The "win" of RDF is that
you aren't contrained by the limits of a particular schema.  If a
particular vocabulary gets you a fair ways towards representing your
resource, but something is missing, it's perfectly reasonable (and
expected) to plug in other vocabularies to fill in the gaps.

For example, SKOS doesn't need to add coordinate properties to
properly define locations.  Instead, you pull in a vocabulary that is
optimized for defining geographic place (say, wgs_84) and rather than
suboptimally retrofit a vocabulary designed for modeling thesauri, use
one that is explicitly intended to model the resource at hand (and,
preferably, only that).

I think it's somewhat analogous to the notion of domain-specific
languages:  there's an abstraction between the resource and the most
efficient way to access it.

> FOAF has both *surname* and *family name* and says: "These are not
> current stable or consistent..." No sh*t. And try to clearly code a name
> like "Pope John Paul II" in FOAF. Oh, and death dates. No death dates in
> FOAF because you wouldn't have DEAD FRIENDS. But authors die.
>

FOAF isn't the only vocabulary available to model people and I'm
hardly saying it's "the answer" here.  I mean, MARC is complicated in
this regard, too.  "Rodrigo Jimenez Hernandez Garcia"  "Liu Ming
Chung".  Names are hard.  I think pretty much any schema is going to
have to have rules and conventions to compensate for the variability
of how different cultures prescribe identity.

Maybe vCard would be better (maybe not).  The Bio vocabulary might be
a better option for defining biographical "events" (birth, death,
etc.).  It lacks some of the attributes that libraries use
(flourishing dates, for example) and shares the disadvantage inherent
in RDF that RDF can't express inexact dates very well.

I think a common misperception of RDF in library circles is that there
is no vocabulary that does everything we need.  Rather, I think that
this is one of RDF's strength: no vocabulary can successfully model
the universe, so, instead, focus on the specifics.  The library world
instead takes the opposite approach, which tends to cause things to
get shoehorned in to meet the shape of the model rather than be
expressed in a way more naturally suited to the resource.

-Ross.