Hi Rob,
You wrote:
>> A format should be described with a schema (XML Schema, OWL etc.) or at
>> least a standard. Mostly this schema already has a namespace or similar
>> identifier that can be used for the whole format.
>
> This is unfortunately not the case.
It is mostly the case - but people like to misinterpret schemas and
tailor them to their needs.
>> For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML
>> Namespace http://www.loc.gov/mods/v3 so this is the best identifier to
>> identify MODS.
>
> And this is a perfect example of why this is not the case.
>
> The same mods schema (let alone namespace) defines TWO formats, mods and
> modsCollection.
That's your interpretation. According to the schema, the MODS format
*is* either a single mods-element or a modsCollection-element. That's
exactely what you can refer to with the namespace identifier
http://www.loc.gov/mods/v3.
If you need to identify the specific element 'mods' of the format only,
then you need another identifer. Up to now there is no default way to
create an identifier for a specific element in an XML format, see
http://www.w3.org/TR/webarch/#xml-fragids
But if the MODS specification defines that you can refer to any element
with an URI fragment identifier, then the right identifier would be
http://www.loc.gov/mods/v3#mods
You wrote:
> I totally agree that it's an awful design choice. However it's a
> demonstration that XML namespaces _do not identify format_. And
> hence, we need another identifier which is not the namespace of
> the top level element.
The namespace http://www.loc.gov/mods/v3 of the top level element 'mods'
does not identify the top level element but the MODS *format* (in any of
the versions 3.0-3.4) itself. This format *includes* the top level
element 'mods'.
> Also consider the following more hypothetical, but perfectly feasible
> situations:
>
> * One namespace is used to define two _totally_ separate sets of
> elements. There's no reason why this can't be done.
Ok, let A and B be two formats with two totally sets of elements (and
rules how to use them). If you put them into one namespace, then you get
a new format C that is the union of A and B.
> * One namespace defines so many elements that it's meaningless to call
> it a format at all. Even though the top level tag might be the same,
> the contents are so varied that you're unable to realistically process
> it.
Sad but true: The word "format" in the context of library applications
does not make sense anyway in most cases. Technically a format is just a
set of possible instances, defined as a formal language or with any
other type of specification. The problem of library formats is that many
people refer to them without providing a proper specification.
Coming back to the mods example: If the SRU Schema registry lists
"info:srw/schema/1/mods-v3.3" as the identifier for "MODS Schema Version
3.3" with a pointer to the XML Schema
"http://www.loc.gov/standards/mods/v3/mods-3-3.xsd" then *any* XML
document that validates against this schema must be considered to be a
MODS 3.3 document - either with 'mods' or with 'modsCollection' as root
element.
Greetings
Jakob
--
Jakob Voß <[log in to unmask]>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de
|