These have to be named graphs, or at least collections of triples which
can be processed through workflows as a single unit.
In terms of LD there version needs to be defined in terms of:
(a) synchronisation with the non-bibliographic real world (i.e. Dataset
Z version X was released at time Y)
(b) correction/augmentation of other datasets (i.e Dataset F version G
contains triples augmenting Dataset H versions A, B, C and D)
(c) mapping between datasets (i.e. Dataset I contains triples mapping
between Dataset J version K and Dataset L version M (and visa-versa))
Note that a 'Dataset' here could be a bibliographic dataset (records of
works, etc), a classification dataset (a version of the Dewey Decimal
Scheme, a version of the Māori Subject Headings, a version of Dublin
Core Scheme, etc), a dataset of real-world entities to do authority
control against (a dbpedia dump, an organisational structure in an
institution, etc), or some arbitrary mapping between some arbitrary
combination of these.
Most of these are going to be managed and generated using current
systems with processes that involve periodic dumps (or drops) of data
(the dbpedia drops of wikipedia data are a good model here). git makes
little sense for this kind of data.
github is most likely to be useful for smaller niche collaborative
collections (probably no more than a million triples) mapping between
the larger collections, and scripts for integrating the collections into
a sane whole.
cheers
stuart
On 28/08/12 08:36, Karen Coyle wrote:
> Ed, Corey -
>
> I also assumed that Ed wasn't suggesting that we literally use github as
> our platform, but I do want to remind folks how far we are from having
> "people friendly" versioning software -- at least, none that I have seen
> has felt "intuitive." The features of git are great, and people have
> built interfaces to it, but as Galen's question brings forth, the very
> *idea* of versioning doesn't exist in library data processing, even
> though having central-system based versions of MARC records (with a
> single time line) is at least conceptually simple.
>
> Therefore it seems to me that first we have to define what a version
> would be, both in terms of data but also in terms of the mind set and
> work flow of the cataloging process. How will people *understand*
> versions in the context of their work? What do they need in order to
> evaluate different versions? And that leads to my second question: what
> is a version in LD space? Triples are just triples - you can add them or
> delete them but I don't know of a way that you can version them, since
> each has an independent T-space existence. So, are we talking about
> named graphs?
>
> I think this should be a high priority activity around the "new
> bibliographic framework" planning because, as we have seen with MARC,
> the idea of versioning needs to be part of the very design or it won't
> happen.
>
> kc
>
> On 8/27/12 11:20 AM, Ed Summers wrote:
>> On Mon, Aug 27, 2012 at 1:33 PM, Corey A Harper <[log in to unmask]>
>> wrote:
>>> I think there's a useful distinction here. Ed can correct me if I'm
>>> wrong, but I suspect he was not actually suggesting that Git itself be
>>> the user-interface to a github-for-data type service, but rather that
>>> such a service can be built *on top* of an infrastructure component
>>> like GitHub.
>> Yes, I wasn't saying that we could just plonk our data into Github,
>> and pat ourselves on the back for a good days work :-) I guess I was
>> stating the obvious: technologies like Git have made once hard
>> problems like decentralized version control much, much easier...and
>> there might be some giants shoulders to stand on.
>>
>> //Ed
>
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/
|