LISTSERV 16.5 - CODE4LIB Archives

On Thu, 17 Jan 2008, Jakob Voss wrote:

> Hi Joe,
>
> You wrote:
>
>> On Wed, 16 Jan 2008, Jakob Voss wrote:
>>> Someone just has to define was 'holding' is and what information it must
>>> carry, so we can define a simple holding interchange format that is not
>>> as fuzzy and overblown as most of the library most other library
>>> standards. As a sideline we implement another part of FRBR (a mapping
>>> from frbr:manifestation to frbr:item)
>>
>> I've been fighting with the issue of what do you return in response to a
>> query (in the context of federated search systems ... but for scientific
>> data, not bibliographic) for almost 4 years now.
>>
>> Although I think FRBR helps to frame the problem, the real issue is that
>> there are many reasons why someone might ask the question, and without
>> knowing what they're trying to solve, we don't know what sort of a record
>> we should be returning.
>
> A holding webservice is not meant to be asked by human beeing with fuzzy
> information needs in mind. Instead it is just one service to tell you
> where an already identifier manifestation can be found. If you still
> don't know which exact manifestation (for instance you don't mind which
> edition of a book), then the holding service needs to be queried for
> each possible manifestation.

I agree -- and I'm only looking at the API side of things.  In building
the Virtual Solar Observatory <http://virtualsolar.org/>, we ran into the
problem that we didn't clearly define what constituted a 'record' in
response of a query.  And the scientists still can't agree, as it affects
what type of questions can be easily answered ... more granular allows
more specific questions, but less granular makes it easy for the scientist
to filter down the result set to determine their needs.

Those people writing user interfaces to make use of the API need to know
what granularity is being returned by the API, and if necessarily,
de-duplicate to make it less granular and more in line with what the user
expects.

So, for instance, to answer the following types of questions, we need
different granularity:

        What stories do you have that I might be interested in?
                (only need 'work')
        What stories do you have that I can understand?
                (language is significant -- need 'expression')
        What stories do you have that are accessible to me?
                (may need characteristics of the packaging, need
                'manifestation')
        What stories do you have that are currently available to me?
                (need attributes of specific physical items)

Technically, we may only need those levels for answering the question, and
then return details at a higher granularity (eg, as I said 'stories', work
may be sufficient)

We start needing the other levels of detail when a person is trying to
make decisions as they drill down in granlarity.

        I've identified that I'd like to read <Work>, what media and/or
        translation is it available in?
                (need a list of expressions, or possibly manifestations)
        I've identified that I'm interested in <expression>, what are my
        options for physical packaging?
                (need a list of manifestations)
        I've identified that I'm interested in <Manifestation>, where can
        I get it from?
                (need a list of items)

I've been trying to keep the terms rather generic, so they fix the use
cases that I'm dealing with, but as an example for say, someone looking to
get a specific movie:

        Do you have the movie w/ english subtitles or dubbed over
        so I can understand it?
        Is it available on VHS, so I can actually watch it?
        Where do I have to go to get it?

In my specific case, the questions are:
        Is the data in units that are meaningful to me?
                (some are raw sensor recordings, which require calibration
                software that not everyone would have, and even once
                calibrated, the data may not be comparable to other
                instruments;  sometimes lossy compression is acceptable,
                other times, it isn't, depending on what the data is being
                used for)
        Is the data in a format that my tools can make use of?
                (must have the necessary metadata, some tools can't deal
                with 4 dimensional data and need individual data cubes,
                not all tools can read FITS / CDF / HDF / NetCDF /etc.)
        How long will it take me to get the data?
                (if it's available locally, get it locally before trying
                to get it from some other mirror in Europe or Asia)


>> (and, to make things more complex, I think there's a group 1 entity that's
>> missing in FRBR -- the concept of 'text' in the scope of the specific
>> words that are used but without the formatting, so I can de-duplicate at
>> the translation level, rather than only once pagination and other
>> typesetting have been applied, at the Expression level.  The best
>> correlation I can come up with to the problem in terms of bibliographic
>> records is the question 'Do you have a copy of the King James Bible?')
>
> I don't see the problem here. The King James Bible is a frbr:expression
> of the frbr:work Bible or a frbr:work of its own (I never really catched
> the difference between frbr:work and frbr:expression). If you ask for
> the text of the King James Bible then you ask for a frbr:item of that
> work/expression with specific additional characteristics of containing
> no formatting but only the text. At http://ebible.org/bible/kjv/ you can
> download the King James Bible in different formats - each file is a
> frbr:item of its own.

Actually, that's what I thought, too, until I was talking to people at the
last ASIS&T annual meeting, and a few were insistant that a translation
was a new work, and not just a new expression.  As you said you weren't
sure, I'm guessing there's probably more debate on that specific issue
than I realize, as I'm not directly active in the FRBR discussions.

Now, there is mention that expression "excludes aspects of physical form,
such as typeface and page layout if they are not integral to the
intellectual or artistic realization of the work as such", but we then get
to the issue of what is 'integral'.

One example I was given was that that of XML formatted documents vs. a
plain text document.  Their argument was that it wasn't on the excluded
list (typeface and page layout), and so therefore made a new expression.
I'm willing to assume that it's actually a notation of formatting, which
is excluded ... if you're adding markup after the fact to an formatted
text.  If you remove formatting from a marked up text, you may be removing
information that is necessary to allow the document to the understandable
(or at least, less misunderstood) by a wider audience.

Expression also includes "mode or medium of expression", and so books on
tape are a seperate expression (and some might argue seperate work), of
the printed form of the work.

If the people I was talking to are just the dissidents in the community,
and most people agree that translations are an expression, then that
greatly solves the issues I've been having with trying to fit my concepts
/ objects to FRBR.



> I think the problem of applying FRBR lies in the lack of authority
> files. There is no easy way to link
>
> http://ebible.org/bible/kjv/kjvtxt.zip (Plain text version)
>
> with the general concept of "The King James Bible" because there is no
> registry of frbr:work/expressions. In some cases LibraryThing does a
> good job to define works, in other Wikipedia may be a better choice.


We're running into the same issue with data ... I think we're going to
have to track provenance information, and have reformatting software
insert identifiers so we can track individual items to their origin.


> The question 'Do you have a copy of the King James Bible?' can be
> answered very well with FRBR in two steps:

[trimmed]

If people are going to classify translations as new expressions, that's
work, as that's the exact sort of thing I was hoping for ... I guess I
just need to wait until things finally get implemented, and we can see how
many people subscribe to the 'translation is a new work' belief.


>> ... anyway, the point is -- you have to define 'holding', or you can't be
>> assured that the response to your request is the correct granularity of
>> information to answer the question you're trying to ask.
>
> Ok, then I'd define a holding an instance of frbr:item with the
> properties "location" (a building, an institution, an URL...),
> "identifier" (call-number, item-number, URL...) and "availability"
> (available, next week, only on campus, free for download...). As shown
> in my ad-hoc example "location" can be nested, but that's not the point.
> Defining holding is not the problem - you just have to look how
> holdings are *practically* used in libraries (instead of starting a
> theoretical discussion). The problem is more how to get the data out of
> library systems.

I probably should stop talking to the theoretical and research folks ...
it did seem much easier when I stuck with the 'functional' in FRBR, and
was just looking at what it would take to implement the model for the
archives I manage ... which gets us back to the practical part:

You need to come to a shared understanding of what you're returning in
response to a 'holdings' request, or the response isn't meaningful ...
which you had already stated, and I probably just confused the matter
further, but was agreeing with you.

-----
Joe Hourcle