LISTSERV 16.5 - CODE4LIB Archives

Hello all,

A little context: the MODS and RDF Descriptive Metadata Subgroup 
(https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup) 
is a group of cultural institutions working together to model MODS XML 
as RDF.

Our project diverges from previous efforts in this domain in that we're 
trying to come up with a model that takes more advantage of widely-used 
vocabularies and namespaces, avoiding blank nodes at all costs.

As we work through the list of MODS elements, we've been stumbling on a 
few thorny issues, and with our goal of making our data as shareable as 
possible, we agreed that it would be helpful to try and get the input of 
folks who have more experience in harvesting and parsing RDF from the 
proliferation of data providers existing in the real world (see 
https://datahub.io/dataset for a great list).

Specifically, when consuming RDF from a new data source, how big of a 
problem are the following issues:


#1. Triples where the object may be a string literal or a URI

For example, the predicate 'dc:subject' from the Dublin Core Elements 
vocabulary has no defined range, which means it can be used with both 
literal and non-literal values 
(http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dc:subject).

So one could have both in a data store:

ex:myObject1  dc:subject  "aircraft" .
ex:myObject2  dc:subject 
<http://id.loc.gov/authorities/subjects/sh85002782> .


... versus ...


#2. Using multiple predicates with similar/overlapping definitions, 
depending on the value of the object

For example, when expressing the subject of a work, using different 
predicates depending on whether there is an existing URI for a topic or not:

ex:myObject1  dc:subject  "aircraft" .
ex:myObject2  dcterms:subject 
<http://id.loc.gov/authorities/subjects/sh85002782> .


We're wondering which approach is less problematic from a Linked 
Data-harvesting standpoint. Issue #1 requires that the parser be 
prepared to handle different types of values from the same predicate, 
but issue #2 involves parsing an additional namespace and predicate, etc.

Any thoughts, suggestions, or comments would be greatly appreciated.

Thanks,
Eben

-- 
Eben English | Boston Public Library
Web Services Developer
617-859-2238 [log in to unmask]