LISTSERV 16.5 - CODE4LIB Archives

On Thu, May 19, 2011 at 1:33 PM, Bill Dueber <[log in to unmask]> wrote:
> record['856'] is defined to return the *first* 856 in the record, which, if
> you look at the documentation...er...ok. Which is not documented as such in
> MARC::Record (http://rubydoc.info/gems/marc/0.4.2/MARC/Record)
>
> To get them all, you need to do something like
>
>  sixfifties = record.fields '650' # returns array of results
>
> Or, to iterate
>
>  record.each_by_tag('650') do |f|
>    puts f['u'] if f['u'] # print out a URL if we've got one
>  end
>

What Bill said.  Also, there's a somewhat complicated calculus that
comes into play here regarding ruby-marc and looking up subfields and
performance.

Modern ruby-marc (which 0.4.2 is an example) has the capability of
providing a hash of the fields for much faster access than:

eight_fifty_sixes = record.find_all { |field| field.tag == "856" }

However, it comes at cost (that is, there's a penalty in building the
field map).  This penalty is offset if you wind up doing a lot of
one-off lookups in a single record.  If you're simply looking for a
single field in every record (or know, before hand, what fields you're
looking for), it's *much* faster to do something like:

tags = ['001', '020', '100', '110', '111', '245', '650', '856']
fields = record.find_all { | field | tags.include?(field.tag) }

or whatever.  At some point we did benchmark of this (Bill Dueber did
it: https://gist.github.com/591907) and the threshold was somewhere
around 6 or so #find_all calls were needed to offset building the
field map.

This is why it's not really documented.   This is the sort of thing
that really needs to go into the ruby-marc wiki.

BTW, the behavior exists for subfields, too.  If you do something like
record['043']['a'] and there are multiple subfield "a"s, you'll only
get the first one.

-Ross.
>
>
> On Thu, May 19, 2011 at 1:16 PM, James Lecard <[log in to unmask]> wrote:
>
>> I'll dig in this one, thanks for this input Jonathan... I'm not so so
>> familiar with the library yet, I'll do some more debugging but in fact what
>> is happening is that I have no value with an access such as
>> record['856']['u'] field, while I get one for record['856']['q']
>> And the marc you are seeing is copy/pasted from a marc editor gui, its not
>> the actual marc record, I edited it so that its data is not recognisable
>> (for confidentiality).
>>
>> James
>>
>>
>> 2011/5/19 Jonathan Rochkind <[log in to unmask]>
>>
>> > Now whether it _means_ what you want it to mean is another question,
>> yeah.
>> > As Andreas said, I don't think that particular example _ought_ to have
>> two
>> > 856's.
>> >
>> > But it ought to be perfectly parseable marc.
>> >
>> > If your 'patch' is to make ruby-marc combine those multiple 856's into
>> one
>> > -- that is not right, two seperate 856's are two seperate 856's, same as
>> any
>> > other marc field. Applying that patch would mess up ruby-marc, not fix
>> it.
>> >
>> > You need to be more specific about what you're doing and what you mean
>> > exactly by 'causing the ruby library to ignore it'.  I wonder if you are
>> > just using the a method in ruby-marc which only returns the first field
>> > matching a given tag when there is more than one.
>> >
>> >
>> >
>> >
>> > On 5/19/2011 12:51 PM, Andreas Orphanides wrote:
>> >
>> >> From the MARC documentation [1]:
>> >>
>> >> "Field 856 is repeated when the location data elements vary (the URL in
>> >> subfield $u or subfields $a, $b, $d, when used). It is also repeated
>> when
>> >> more than one access method is used, different portions of the item are
>> >> available electronically, mirror sites are recorded, different
>> >> formats/resolutions with different URLs are indicated, and related items
>> are
>> >> recorded."
>> >>
>> >> So it looks like however the URL is handled, a single 856 field should
>> be
>> >> used to indicate the location [2]. I am not familiar enough with MARC to
>> say
>> >> how it "should" have been done, but it looks like $q and $u would
>> probably
>> >> be sufficient (if they're in the same line).
>> >>
>> >> However, since the field is repeatable, the parser shouldn't be choking
>> on
>> >> it, unless it's choking on it for a sophisticated reason (e.g., "These
>> >> aren't the subfield tags I expect to be seeing"). It also looks like if
>> $u
>> >> is provided, the first subfield should indicate access method (in this
>> case
>> >> "4" for HTTP). Maybe that's what's causing the problem? [3]
>> >>
>> >> Anyway, I think having these two parts of the same URL data on separate
>> >> lines is definitely Not Right, but I am not sure if it adds up to
>> invalid
>> >> MARC.
>> >>
>> >> -dre.
>> >>
>> >> [1] http://www.loc.gov/marc/bibliographic/bd856.html
>> >> [2] I am not a cataloger. Don't hurt me.
>> >> [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.
>> >>
>> >> On 5/19/2011 12:37 PM, James Lecard wrote:
>> >>
>> >>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I
>> get
>> >>> from a partner.
>> >>>
>> >>> The 856 field is splitted over 2 lines, causing the ruby library to
>> >>> ignore
>> >>> it (I've patched it to overcome this issue) but I want to know if this
>> >>> kind
>> >>> of marc is valid ?
>> >>>
>> >>> =LDR  00638nam  2200181uu 4500
>> >>> =001  cla-MldNA01
>> >>> =008  080101s2008\\\\\\\|||||||||||||||||fre||
>> >>> =040  \\$aMy Provider
>> >>> =041  0\$afre
>> >>> =245  10$aThis Subject
>> >>> =260  \\$aParis$bJ. Doe$c2008
>> >>> =490  \\$aSome topic
>> >>> =650  1\$aNarratif, Autre forme
>> >>> =655  \7$abook$2lcsh
>> >>> =752  \\$aA Place on earth
>> >>> =776  \\$dParis: John Doe and Cie, 1973
>> >>> =856  \2$qtext/html
>> >>> =856
>>  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library
>> >>>
>> >>> Thanks,
>> >>>
>> >>> James L.
>> >>>
>> >>
>> >>
>>
>
>
>
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library
>