LISTSERV 16.5 - CODE4LIB Archives

I don't recommend using different properties that have the same basic
semantic meaning for those different contexts (dc:subject vs.
dcterms:subject). In a linked data environment, I don't recommend using
Dublin Core Elements at all, but only dcterms. It is possible to harvest
subject terms regardless of whether it is a literal or a URI, but the
harvester might have to take some additional action to generate a human
readable result from an LCSH URI.

1. The harvester goes out and fetches the machine readable data for
http://id.loc.gov/authorities/subjects/sh85002782 to get the label
2. You import the RDF for LCSH into your system so that an OPTIONAL line
can be inserted into SPARQL (assuming you are using SPARQL) to get the
skos:prefLabel for the URI directly from your own system.

I'd suggest discussing these options with developers that may potentially
harvest your data, or at least provide a means to developers to give you
feedback so that you can deliver a web service that makes harvesting as
efficient as possible.

I hope this is useful. I think there are many possible solutions. But, in
sum, don't use dc:subject and dcterms:subject simultaneously.

Ethan

On Mon, May 9, 2016 at 1:58 PM, English, Eben <[log in to unmask]> wrote:

> Hello all,
>
> A little context: the MODS and RDF Descriptive Metadata Subgroup
> (
> https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup
> )
> is a group of cultural institutions working together to model MODS XML
> as RDF.
>
> Our project diverges from previous efforts in this domain in that we're
> trying to come up with a model that takes more advantage of widely-used
> vocabularies and namespaces, avoiding blank nodes at all costs.
>
> As we work through the list of MODS elements, we've been stumbling on a
> few thorny issues, and with our goal of making our data as shareable as
> possible, we agreed that it would be helpful to try and get the input of
> folks who have more experience in harvesting and parsing RDF from the
> proliferation of data providers existing in the real world (see
> https://datahub.io/dataset for a great list).
>
> Specifically, when consuming RDF from a new data source, how big of a
> problem are the following issues:
>
>
> #1. Triples where the object may be a string literal or a URI
>
> For example, the predicate 'dc:subject' from the Dublin Core Elements
> vocabulary has no defined range, which means it can be used with both
> literal and non-literal values
> (
> http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dc:subject
> ).
>
> So one could have both in a data store:
>
> ex:myObject1  dc:subject  "aircraft" .
> ex:myObject2  dc:subject
> <http://id.loc.gov/authorities/subjects/sh85002782> .
>
>
> ... versus ...
>
>
> #2. Using multiple predicates with similar/overlapping definitions,
> depending on the value of the object
>
> For example, when expressing the subject of a work, using different
> predicates depending on whether there is an existing URI for a topic or
> not:
>
> ex:myObject1  dc:subject  "aircraft" .
> ex:myObject2  dcterms:subject
> <http://id.loc.gov/authorities/subjects/sh85002782> .
>
>
> We're wondering which approach is less problematic from a Linked
> Data-harvesting standpoint. Issue #1 requires that the parser be
> prepared to handle different types of values from the same predicate,
> but issue #2 involves parsing an additional namespace and predicate, etc.
>
> Any thoughts, suggestions, or comments would be greatly appreciated.
>
> Thanks,
> Eben
>
> --
> Eben English | Boston Public Library
> Web Services Developer
> 617-859-2238 [log in to unmask]
>