LISTSERV 16.5 - CODE4LIB Archives

Crosswalk is exactly the wrong answer for this. Two very small overlapping communities of most library developers can surely agree on using the same identifiers, and then we make things easier for US.  We don't need to solve the entire universe of problems. Solve the simple problem in front of you in the simplest way that could possibly work and still leave room for future expansion and improvement. From that, we learn how to solve the big problems, when we're ready. Overreach and try to solve the huge problem including every possible use case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with the kind of over-abstracted over-engineered too-complicated-to-actually-catch-on solutions that... we in the library community normally end up with. 
________________________________________
From: Code for Libraries [[log in to unmask]] On Behalf Of Peter Noerr [[log in to unmask]]
Sent: Thursday, April 30, 2009 6:37 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Some further observations. So far this threadling has mentioned only trying to unify two different sets of identifiers. However there are a much larger number of them out there (and even larger numbers of schemas and other "standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about") and the problem exists for any of these things (identifiers, etc.) where there are more than one of them. So really unifying two sets of identifiers, while very useful, is not actually going to solve much.

Is there any broader methodology we could approach which potentially allows multiple unifications or (my favourite) cross-walks. (Complete unification requires everybody agrees and sticks to it, and human history is sort of not on that track...) And who (people and organizations) would undertake this?

Ross' point about a lightweight approach is necessary for any sort of adoption, but this is a problem (which plagues all we do in federated search) which cannot just be solved by another registry. Somebody/organisation has to look at the identifiers or whatever and decide that two of them are identical or, worse, only partially overlap and hence scope has to be defined. In a syntax that all understand of course. Already in this thread we have the sub/super case question from Karen (in a post on the openurl (or Z39.88 <sigh> - identifiers!) listserv). And the various identifiers for MARC (below) could easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of one (computer understandable) syllable what the differences are.

I'm not trying to make problems. There are problems and this is only a small subset of them, and they confound us every day. I would love to adopt standard definitions for these things, but which Standard? Because anyone can produce any identifier they like, we have decided that the unification of them has to be kept internal where we at least have control of the unifications, even if they change pretty frequently.

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Ross Singer
> Sent: Thursday, April 30, 2009 12:00
> To: [log in to unmask]
> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
> All
>
> Hello everybody.  I apologize for the crossposting, but this is an
> area that could (potentially) affect every one of these groups.  I
> realize that not everybody will be able to respond to all lists,
> but...
>
> First of all, some back story (Code4Lib subscribers can probably skip
> ahead):
>
> Jangle [1] requires URIs to explicitly declare the format of the data
> it is transporting (binary marc, marcxml, vcard, DLF
> simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
> own URI structure for this (http://jangle.org/vocab/formats#...) but
> this was always been with the intention of moving out of the
> jangle.org into a more "generic" space so it could be used by other
> initiatives.
>
> This same concept came up in UnAPI [2] (I think this thread:
> http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
> March/thread.html#682
> discusses it a bit - there is a reference there that it maybe had come
> up before) although was rejected ultimately in favor of an (optional)
> approach more in line with how OAI-PMH disambiguates metadata formats.
>  That being said, this page used to try to set sort of convention
> around the UnAPI formats:
> http://unapi.stikipad.com/unapi/show/existing+formats
> But it's now just a squatter page.
>
> Jakob Voss pointed out that SRU has a schema registry and that it
> would make sense to coordinate with this rather than mint new URIs for
> things that have already been defined there:
> http://www.loc.gov/standards/sru/resources/schemas.html
>
> This, of course, made a lot of sense.  It also made me realize that
> OpenURL *also* has a registry of metadata formats:
> http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataP
> refix=oai_dc&set=Core:Metadata+Formats
>
> The problem here is that OpenURL and SRW are using different info URIs
> to describe the same things:
>
> info:srw/schema/1/marcxml-v1.1
>
> info:ofi/fmt:xml:xsd:MARC21
>
> or
>
> info:srw/schema/1/onix-v2.0
>
> info:ofi/fmt:xml:xsd:onix
>
> The latter technically isn't the same thing since the OpenURL one
> claims it's an identifier for ONIX 2.1, but if I wasn't sending this
> email now, eventually SRU would have registered
> info:srw/schema/1/onix-v2.1
>
> There are several other examples, as well (MODS, ISO20775, etc.) and
> it's not a stretch to envision more in the future.
>
> So there are a couple of questions here.
>
> First, and most importantly, how do we reconcile these different
> identifiers for the same thing?  Can we come up with some agreement on
> which ones we should really use?
>
> Secondly, and this gets to the reason why any of this was brought up
> in the first place, how can we coordinate these identifiers more
> effectively and efficiently to reuse among various specs and
> protocols, but not:
> 1) be tied to a particular community
> 2) require some laborious and lengthy submission and review process to
> just say "hey, here's my FOAF available via UnAPI"
> 3) be so lax that it throws all hope of authority out the window
> ?
>
> I would expect the various communities to still maintain their own
> registries of "approved" data formats (well, OpenURL and SRU, anyway
> -- it's not as appropriate to UnAPI or Jangle).
>
> Does something like this interest any of you?  Is there value in such
> an initiative?
>
> Thanks,
> -Ross.