Print

Print


These have to be named graphs, or at least collections of triples which 
can be processed through workflows as a single unit.

In terms of LD there version needs to be defined in terms of:

(a) synchronisation with the non-bibliographic real world (i.e. Dataset 
Z version X was released at time Y)

(b) correction/augmentation of other datasets (i.e Dataset F version G 
contains triples augmenting Dataset H versions A, B, C and D)

(c) mapping between datasets (i.e. Dataset I contains triples mapping 
between Dataset J version K and Dataset L version M (and visa-versa))

Note that a 'Dataset' here could be a bibliographic dataset (records of 
works, etc), a classification dataset (a version of the Dewey Decimal 
Scheme, a version of the Māori Subject Headings, a version of Dublin 
Core Scheme, etc), a dataset of real-world entities to do authority 
control against (a dbpedia dump, an organisational structure in an 
institution, etc), or some arbitrary mapping between some arbitrary 
combination of these.

Most of these are going to be managed and generated using current 
systems with processes that involve periodic dumps (or drops) of data 
(the dbpedia drops of wikipedia data are a good model here). git makes 
little sense for this kind of data.

github is most likely to be useful for smaller niche collaborative 
collections (probably no more than a million triples) mapping between 
the larger collections, and scripts for integrating the collections into 
a sane whole.

cheers
stuart

On 28/08/12 08:36, Karen Coyle wrote:
> Ed, Corey -
>
> I also assumed that Ed wasn't suggesting that we literally use github as
> our platform, but I do want to remind folks how far we are from having
> "people friendly" versioning software -- at least, none that I have seen
> has felt "intuitive." The features of git are great, and people have
> built interfaces to it, but as Galen's question brings forth, the very
> *idea* of versioning doesn't exist in library data processing, even
> though having central-system based versions of MARC records (with a
> single time line) is at least conceptually simple.
>
> Therefore it seems to me that first we have to define what a version
> would be, both in terms of data but also in terms of the mind set and
> work flow of the cataloging process. How will people *understand*
> versions in the context of their work? What do they need in order to
> evaluate different versions? And that leads to my second question: what
> is a version in LD space? Triples are just triples - you can add them or
> delete them but I don't know of a way that you can version them, since
> each has an independent T-space existence. So, are we talking about
> named graphs?
>
> I think this should be a high priority activity around the "new
> bibliographic framework" planning because, as we have seen with MARC,
> the idea of versioning needs to be part of the very design or it won't
> happen.
>
> kc
>
> On 8/27/12 11:20 AM, Ed Summers wrote:
>> On Mon, Aug 27, 2012 at 1:33 PM, Corey A Harper <[log in to unmask]>
>> wrote:
>>> I think there's a useful distinction here. Ed can correct me if I'm
>>> wrong, but I suspect he was not actually suggesting that Git itself be
>>> the user-interface to a github-for-data type service, but rather that
>>> such a service can be built *on top* of an infrastructure component
>>> like GitHub.
>> Yes, I wasn't saying that we could just plonk our data into Github,
>> and pat ourselves on the back for a good days work :-) I guess I was
>> stating the obvious: technologies like Git have made once hard
>> problems like decentralized version control much, much easier...and
>> there might be some giants shoulders to stand on.
>>
>> //Ed
>


-- 
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/