Print

Print


Hi Rob,

You wrote:

>> A format should be described with a schema (XML Schema, OWL etc.) or at 
>> least a standard. Mostly this schema already has a namespace or similar 
>> identifier that can be used for the whole format.
> 
> This is unfortunately not the case.

It is mostly the case - but people like to misinterpret schemas and 
tailor them to their needs.

>> For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML 
>> Namespace http://www.loc.gov/mods/v3 so this is the best identifier to 
>> identify MODS. 
> 
> And this is a perfect example of why this is not the case.
> 
> The same mods schema (let alone namespace) defines TWO formats, mods and
> modsCollection.

That's your interpretation. According to the schema, the MODS format 
*is* either a single mods-element or a modsCollection-element. That's 
exactely what you can refer to with the namespace identifier 
http://www.loc.gov/mods/v3.

If you need to identify the specific element 'mods' of the format only, 
then you need another identifer. Up to now there is no default way to 
create an identifier for a specific element in an XML format, see
http://www.w3.org/TR/webarch/#xml-fragids

But if the MODS specification defines that you can refer to any element 
with an URI fragment identifier, then the right identifier would be 
http://www.loc.gov/mods/v3#mods

You wrote:

 > I totally agree that it's an awful design choice. However it's a
 > demonstration that XML namespaces _do not identify format_.  And
 > hence, we need another identifier which is not the namespace of
 > the top level element.

The namespace http://www.loc.gov/mods/v3 of the top level element 'mods' 
does not identify the top level element but the MODS *format* (in any of 
the versions 3.0-3.4) itself. This format *includes* the top level 
element 'mods'.

> Also consider the following more hypothetical, but perfectly feasible
> situations:
> 
> * One namespace is used to define two _totally_ separate sets of
> elements.  There's no reason why this can't be done.

Ok, let A and B be two formats with two totally sets of elements (and 
rules how to use them). If you put them into one namespace, then you get 
a new format C that is the union of A and B.

> * One namespace defines so many elements that it's meaningless to call
> it a format at all.  Even though the top level tag might be the same,
> the contents are so varied that you're unable to realistically process
> it.

Sad but true: The word "format" in the context of library applications 
does not make sense anyway in most cases. Technically a format is just a 
set of possible instances, defined as a formal language or with any 
other type of specification. The problem of library formats is that many 
people refer to them without providing a proper specification.

Coming back to the mods example: If the SRU Schema registry lists 
"info:srw/schema/1/mods-v3.3" as the identifier for "MODS Schema Version 
3.3" with a pointer to the XML Schema 
"http://www.loc.gov/standards/mods/v3/mods-3-3.xsd" then *any* XML 
document that validates against this schema must be considered to be a 
MODS 3.3 document - either with 'mods' or with 'modsCollection' as root 
element.

Greetings
Jakob

-- 
Jakob Voß <[log in to unmask]>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de