LISTSERV 16.5 - CODE4LIB Archives

Thanks, Bill. What you say about "assumptions" is a good part of what is 
motivating me to try to instigate a discussion. As you know, both FRBR 
and RDA were developed by the cataloging community with no input from 
technologists. There are sweeping statements about FRBR being "more 
efficient" than the MARC model, but without, that I can find, any real 
analysis. There was a study done at OCLC on the ratio of Works to 
Manifestations (and that shows in their stats today), but the OCLC 
catalog is not representative of the catalog of a single library.

What I'm hoping to do is to surface some of the assumptions so that we 
can talk about them. I'll make a stab at an analysis, but I'm really 
interested in the conversation that could follow what I have to say.

kc

On 10/16/13 5:43 PM, Bill Dueber wrote:
> My guess is that traversing the WEM structure for display of a single
> record (e.g., in a librarian's ILS client or what not) will not be a
> problem at all, because the volume is so low.  In terms of the OPAC
> interface itself, well, there are lots and lots of way to denormalize the
> data (meaning "copy over and inline data whose canonical values are in
> their own tables somewhere") for search and display purposes. Heck, lots of
> us do this on a smaller and less complicated scale already, as we dump data
> into Solr for our public catalogs.
>
> This adds complexity to the system (determining what to denormalize,
> determining when some underlying value has changed and knowing what other
> elements need updating), but it's the sort of complexity that's been
> well-studied and doesn't worry me too much.
>
> I'm much, *much* more "nerd" than "librarian," and if there's one thing I
> wish I could get across to people who swing the other way, it's that
> getting the data model right is so very much harder than figuring out how
> to process it. Make sure the individual elements are machine-intelligible,
> and there are hoards of smart people (both within and outside of the
> library world) who will figure out how efficiently(-enough) store and
> retrieve it. And, for the love of god, have someone around who can at least
> speak authoritatively about what sorts of things fall into the "hard" and
> "easy-peasy" categories in terms of the technology, instead of making
> assumptions.
>
>
>
>
> On Wed, Oct 16, 2013 at 6:23 PM, Karen Coyle <[log in to unmask]> wrote:
>
>> Yes, that's my take as well, but I think it's worth quantifying if
>> possible. There is the usual trade-off between time and space -- and I'd be
>> interested in hearing whether anyone here thinks that there is any concern
>> about traversing the WEM structure for each search and display. Does it
>> matter if every display of author in a Manifestation has to connect M-E-W?
>> Or is that a concern, like space, that is no longer relevant?
>>
>> kc
>>
>>
>>
>> On 10/16/13 12:57 PM, Bill Dueber wrote:
>>
>>> If anyone out there is really making a case for FRBR based on whether or
>>> not it saves a few characters in a database, well, she should give up the
>>> library business and go make money off  her time machine . Maybe --
>>> *maybe* --
>>>
>>> 15 years ago. But I have to say, I'm sitting on 10m records right now, and
>>> would happily figure out how to deal with double or triple the space
>>> requirements for added utility. Space is always a consideration, but it's
>>> slipped down into about 15th place on my Giant List of Things to Worry
>>> About.
>>>
>>>
>>> On Wed, Oct 16, 2013 at 3:49 PM, Karen Coyle <[log in to unmask]> wrote:
>>>
>>>   On 10/16/13 12:33 PM, Kyle Banerjee wrote:
>>>>   BTW, I don't think 240 is a good substitute as the content is very
>>>>> different than in the regular title. That's where you'll find music,
>>>>> laws,
>>>>> selections, translations and it's totally littered with subfields. The
>>>>> 70.1
>>>>> figure from the stripped 245 is probably closer to the mark
>>>>>
>>>>>   Yes, you are right, especially for the particular purpose I am looking
>>>> at.
>>>> Thanks.
>>>>
>>>>
>>>>
>>>>   IMO, what you stand to gain in functionality, maintenance, and analysis
>>>>> is
>>>>> much more interesting than potential space gains/losses.
>>>>>
>>>>>   Yes, obviously. But there exists an apology for FRBR that says that it
>>>> will save cataloger time and will be more efficient in a database. I
>>>> think
>>>> it's worth taking a look at those assumptions. If there is a way to
>>>> measure
>>>> functionality, maintenance, etc. then we should measure it, for sure.
>>>>
>>>> kc
>>>>
>>>>
>>>>
>>>>   kyle
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 16, 2013 at 12:00 PM, Karen Coyle <[log in to unmask]> wrote:
>>>>>
>>>>>    Thanks, Roy (and others!)
>>>>>
>>>>>> It looks like the 245 is including the $c - dang! I should have been
>>>>>> more
>>>>>> specific. I'm mainly interested in the title, which is $a $b -- I'm
>>>>>> looking
>>>>>> at the gains and losses of bytes should one implement FRBR. As a hedge,
>>>>>> could I ask what've you got for the 240? that may be closer to reality.
>>>>>>
>>>>>> kc
>>>>>>
>>>>>>
>>>>>> On 10/16/13 10:57 AM, Roy Tennant wrote:
>>>>>>
>>>>>>    I don't even have to fire it up. That's a statistic that we generate
>>>>>>
>>>>>>> quarterly (albeit via Hadoop). Here you go:
>>>>>>>
>>>>>>> 100 - 30.3
>>>>>>> 245 - 103.1
>>>>>>> 600 - 41
>>>>>>> 610 - 48.8
>>>>>>> 611 - 61.4
>>>>>>> 630 - 40.8
>>>>>>> 648 - 23.8
>>>>>>> 650 - 35.1
>>>>>>> 651 - 39.6
>>>>>>> 653 - 33.3
>>>>>>> 654 - 38.1
>>>>>>> 655 - 22.5
>>>>>>> 656 - 30.6
>>>>>>> 657 - 27.4
>>>>>>> 658 - 30.7
>>>>>>> 662 - 41.7
>>>>>>>
>>>>>>> Roy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 16, 2013 at 10:38 AM, Sean Hannan <[log in to unmask]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>     That sounds like a request for Roy to fire up the ole OCLC Hadoop.
>>>>>>>
>>>>>>>   -Sean
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/16/13 1:06 PM, "Karen Coyle" <[log in to unmask]> wrote:
>>>>>>>>
>>>>>>>>     Anybody have data for the average length of specific MARC fields
>>>>>>>> in
>>>>>>>> some
>>>>>>>>
>>>>>>>>   reasonably representative database? I mainly need 100, 245, 6xx.
>>>>>>>>> Thanks,
>>>>>>>>> kc
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Karen Coyle
>>>>>>>>> [log in to unmask] http://kcoyle.net
>>>>>>>>> m: 1-510-435-8234
>>>>>>>>> skype: kcoylenet
>>>>>>>>>
>>>>>>>>>    --
>>>>>>>>>
>>>>>>>> Karen Coyle
>>>>>> [log in to unmask] http://kcoyle.net
>>>>>> m: 1-510-435-8234
>>>>>> skype: kcoylenet
>>>>>>
>>>>>>
>>>>>>   --
>>>> Karen Coyle
>>>> [log in to unmask] http://kcoyle.net
>>>> m: 1-510-435-8234
>>>> skype: kcoylenet
>>>>
>>>>
>>>
>> --
>> Karen Coyle
>> [log in to unmask] http://kcoyle.net
>> m: 1-510-435-8234
>> skype: kcoylenet
>>
>
>

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet