On Thu, 17 Jan 2008, Jakob Voss wrote: > Hi Joe, > > You wrote: > >> On Wed, 16 Jan 2008, Jakob Voss wrote: >>> Someone just has to define was 'holding' is and what information it must >>> carry, so we can define a simple holding interchange format that is not >>> as fuzzy and overblown as most of the library most other library >>> standards. As a sideline we implement another part of FRBR (a mapping >>> from frbr:manifestation to frbr:item) >> >> I've been fighting with the issue of what do you return in response to a >> query (in the context of federated search systems ... but for scientific >> data, not bibliographic) for almost 4 years now. >> >> Although I think FRBR helps to frame the problem, the real issue is that >> there are many reasons why someone might ask the question, and without >> knowing what they're trying to solve, we don't know what sort of a record >> we should be returning. > > A holding webservice is not meant to be asked by human beeing with fuzzy > information needs in mind. Instead it is just one service to tell you > where an already identifier manifestation can be found. If you still > don't know which exact manifestation (for instance you don't mind which > edition of a book), then the holding service needs to be queried for > each possible manifestation. I agree -- and I'm only looking at the API side of things. In building the Virtual Solar Observatory <http://virtualsolar.org/>, we ran into the problem that we didn't clearly define what constituted a 'record' in response of a query. And the scientists still can't agree, as it affects what type of questions can be easily answered ... more granular allows more specific questions, but less granular makes it easy for the scientist to filter down the result set to determine their needs. Those people writing user interfaces to make use of the API need to know what granularity is being returned by the API, and if necessarily, de-duplicate to make it less granular and more in line with what the user expects. So, for instance, to answer the following types of questions, we need different granularity: What stories do you have that I might be interested in? (only need 'work') What stories do you have that I can understand? (language is significant -- need 'expression') What stories do you have that are accessible to me? (may need characteristics of the packaging, need 'manifestation') What stories do you have that are currently available to me? (need attributes of specific physical items) Technically, we may only need those levels for answering the question, and then return details at a higher granularity (eg, as I said 'stories', work may be sufficient) We start needing the other levels of detail when a person is trying to make decisions as they drill down in granlarity. I've identified that I'd like to read <Work>, what media and/or translation is it available in? (need a list of expressions, or possibly manifestations) I've identified that I'm interested in <expression>, what are my options for physical packaging? (need a list of manifestations) I've identified that I'm interested in <Manifestation>, where can I get it from? (need a list of items) I've been trying to keep the terms rather generic, so they fix the use cases that I'm dealing with, but as an example for say, someone looking to get a specific movie: Do you have the movie w/ english subtitles or dubbed over so I can understand it? Is it available on VHS, so I can actually watch it? Where do I have to go to get it? In my specific case, the questions are: Is the data in units that are meaningful to me? (some are raw sensor recordings, which require calibration software that not everyone would have, and even once calibrated, the data may not be comparable to other instruments; sometimes lossy compression is acceptable, other times, it isn't, depending on what the data is being used for) Is the data in a format that my tools can make use of? (must have the necessary metadata, some tools can't deal with 4 dimensional data and need individual data cubes, not all tools can read FITS / CDF / HDF / NetCDF /etc.) How long will it take me to get the data? (if it's available locally, get it locally before trying to get it from some other mirror in Europe or Asia) >> (and, to make things more complex, I think there's a group 1 entity that's >> missing in FRBR -- the concept of 'text' in the scope of the specific >> words that are used but without the formatting, so I can de-duplicate at >> the translation level, rather than only once pagination and other >> typesetting have been applied, at the Expression level. The best >> correlation I can come up with to the problem in terms of bibliographic >> records is the question 'Do you have a copy of the King James Bible?') > > I don't see the problem here. The King James Bible is a frbr:expression > of the frbr:work Bible or a frbr:work of its own (I never really catched > the difference between frbr:work and frbr:expression). If you ask for > the text of the King James Bible then you ask for a frbr:item of that > work/expression with specific additional characteristics of containing > no formatting but only the text. At http://ebible.org/bible/kjv/ you can > download the King James Bible in different formats - each file is a > frbr:item of its own. Actually, that's what I thought, too, until I was talking to people at the last ASIS&T annual meeting, and a few were insistant that a translation was a new work, and not just a new expression. As you said you weren't sure, I'm guessing there's probably more debate on that specific issue than I realize, as I'm not directly active in the FRBR discussions. Now, there is mention that expression "excludes aspects of physical form, such as typeface and page layout if they are not integral to the intellectual or artistic realization of the work as such", but we then get to the issue of what is 'integral'. One example I was given was that that of XML formatted documents vs. a plain text document. Their argument was that it wasn't on the excluded list (typeface and page layout), and so therefore made a new expression. I'm willing to assume that it's actually a notation of formatting, which is excluded ... if you're adding markup after the fact to an formatted text. If you remove formatting from a marked up text, you may be removing information that is necessary to allow the document to the understandable (or at least, less misunderstood) by a wider audience. Expression also includes "mode or medium of expression", and so books on tape are a seperate expression (and some might argue seperate work), of the printed form of the work. If the people I was talking to are just the dissidents in the community, and most people agree that translations are an expression, then that greatly solves the issues I've been having with trying to fit my concepts / objects to FRBR. > I think the problem of applying FRBR lies in the lack of authority > files. There is no easy way to link > > http://ebible.org/bible/kjv/kjvtxt.zip (Plain text version) > > with the general concept of "The King James Bible" because there is no > registry of frbr:work/expressions. In some cases LibraryThing does a > good job to define works, in other Wikipedia may be a better choice. We're running into the same issue with data ... I think we're going to have to track provenance information, and have reformatting software insert identifiers so we can track individual items to their origin. > The question 'Do you have a copy of the King James Bible?' can be > answered very well with FRBR in two steps: [trimmed] If people are going to classify translations as new expressions, that's work, as that's the exact sort of thing I was hoping for ... I guess I just need to wait until things finally get implemented, and we can see how many people subscribe to the 'translation is a new work' belief. >> ... anyway, the point is -- you have to define 'holding', or you can't be >> assured that the response to your request is the correct granularity of >> information to answer the question you're trying to ask. > > Ok, then I'd define a holding an instance of frbr:item with the > properties "location" (a building, an institution, an URL...), > "identifier" (call-number, item-number, URL...) and "availability" > (available, next week, only on campus, free for download...). As shown > in my ad-hoc example "location" can be nested, but that's not the point. > Defining holding is not the problem - you just have to look how > holdings are *practically* used in libraries (instead of starting a > theoretical discussion). The problem is more how to get the data out of > library systems. I probably should stop talking to the theoretical and research folks ... it did seem much easier when I stuck with the 'functional' in FRBR, and was just looking at what it would take to implement the model for the archives I manage ... which gets us back to the practical part: You need to come to a shared understanding of what you're returning in response to a 'holdings' request, or the response isn't meaningful ... which you had already stated, and I probably just confused the matter further, but was agreeing with you. ----- Joe Hourcle