Thanks a lot, its exactly what was happening here, I was getting the first
856's only... Everything now rocks, and my "fix" was a mess up ;-)
I didn't understand the behaviour of the API very well.
Best Regards,
James
2011/5/19 Ross Singer <[log in to unmask]>
> On Thu, May 19, 2011 at 1:33 PM, Bill Dueber <[log in to unmask]> wrote:
> > record['856'] is defined to return the *first* 856 in the record, which,
> if
> > you look at the documentation...er...ok. Which is not documented as such
> in
> > MARC::Record (http://rubydoc.info/gems/marc/0.4.2/MARC/Record)
> >
> > To get them all, you need to do something like
> >
> > sixfifties = record.fields '650' # returns array of results
> >
> > Or, to iterate
> >
> > record.each_by_tag('650') do |f|
> > puts f['u'] if f['u'] # print out a URL if we've got one
> > end
> >
>
> What Bill said. Also, there's a somewhat complicated calculus that
> comes into play here regarding ruby-marc and looking up subfields and
> performance.
>
> Modern ruby-marc (which 0.4.2 is an example) has the capability of
> providing a hash of the fields for much faster access than:
>
> eight_fifty_sixes = record.find_all { |field| field.tag == "856" }
>
> However, it comes at cost (that is, there's a penalty in building the
> field map). This penalty is offset if you wind up doing a lot of
> one-off lookups in a single record. If you're simply looking for a
> single field in every record (or know, before hand, what fields you're
> looking for), it's *much* faster to do something like:
>
> tags = ['001', '020', '100', '110', '111', '245', '650', '856']
> fields = record.find_all { | field | tags.include?(field.tag) }
>
> or whatever. At some point we did benchmark of this (Bill Dueber did
> it: https://gist.github.com/591907) and the threshold was somewhere
> around 6 or so #find_all calls were needed to offset building the
> field map.
>
> This is why it's not really documented. This is the sort of thing
> that really needs to go into the ruby-marc wiki.
>
> BTW, the behavior exists for subfields, too. If you do something like
> record['043']['a'] and there are multiple subfield "a"s, you'll only
> get the first one.
>
> -Ross.
> >
> >
> > On Thu, May 19, 2011 at 1:16 PM, James Lecard <[log in to unmask]>
> wrote:
> >
> >> I'll dig in this one, thanks for this input Jonathan... I'm not so so
> >> familiar with the library yet, I'll do some more debugging but in fact
> what
> >> is happening is that I have no value with an access such as
> >> record['856']['u'] field, while I get one for record['856']['q']
> >> And the marc you are seeing is copy/pasted from a marc editor gui, its
> not
> >> the actual marc record, I edited it so that its data is not recognisable
> >> (for confidentiality).
> >>
> >> James
> >>
> >>
> >> 2011/5/19 Jonathan Rochkind <[log in to unmask]>
> >>
> >> > Now whether it _means_ what you want it to mean is another question,
> >> yeah.
> >> > As Andreas said, I don't think that particular example _ought_ to have
> >> two
> >> > 856's.
> >> >
> >> > But it ought to be perfectly parseable marc.
> >> >
> >> > If your 'patch' is to make ruby-marc combine those multiple 856's into
> >> one
> >> > -- that is not right, two seperate 856's are two seperate 856's, same
> as
> >> any
> >> > other marc field. Applying that patch would mess up ruby-marc, not fix
> >> it.
> >> >
> >> > You need to be more specific about what you're doing and what you mean
> >> > exactly by 'causing the ruby library to ignore it'. I wonder if you
> are
> >> > just using the a method in ruby-marc which only returns the first
> field
> >> > matching a given tag when there is more than one.
> >> >
> >> >
> >> >
> >> >
> >> > On 5/19/2011 12:51 PM, Andreas Orphanides wrote:
> >> >
> >> >> From the MARC documentation [1]:
> >> >>
> >> >> "Field 856 is repeated when the location data elements vary (the URL
> in
> >> >> subfield $u or subfields $a, $b, $d, when used). It is also repeated
> >> when
> >> >> more than one access method is used, different portions of the item
> are
> >> >> available electronically, mirror sites are recorded, different
> >> >> formats/resolutions with different URLs are indicated, and related
> items
> >> are
> >> >> recorded."
> >> >>
> >> >> So it looks like however the URL is handled, a single 856 field
> should
> >> be
> >> >> used to indicate the location [2]. I am not familiar enough with MARC
> to
> >> say
> >> >> how it "should" have been done, but it looks like $q and $u would
> >> probably
> >> >> be sufficient (if they're in the same line).
> >> >>
> >> >> However, since the field is repeatable, the parser shouldn't be
> choking
> >> on
> >> >> it, unless it's choking on it for a sophisticated reason (e.g.,
> "These
> >> >> aren't the subfield tags I expect to be seeing"). It also looks like
> if
> >> $u
> >> >> is provided, the first subfield should indicate access method (in
> this
> >> case
> >> >> "4" for HTTP). Maybe that's what's causing the problem? [3]
> >> >>
> >> >> Anyway, I think having these two parts of the same URL data on
> separate
> >> >> lines is definitely Not Right, but I am not sure if it adds up to
> >> invalid
> >> >> MARC.
> >> >>
> >> >> -dre.
> >> >>
> >> >> [1] http://www.loc.gov/marc/bibliographic/bd856.html
> >> >> [2] I am not a cataloger. Don't hurt me.
> >> >> [3] I am not an expert on MARC ingest or on ruby-marc. I could be
> wrong.
> >> >>
> >> >> On 5/19/2011 12:37 PM, James Lecard wrote:
> >> >>
> >> >>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I
> >> get
> >> >>> from a partner.
> >> >>>
> >> >>> The 856 field is splitted over 2 lines, causing the ruby library to
> >> >>> ignore
> >> >>> it (I've patched it to overcome this issue) but I want to know if
> this
> >> >>> kind
> >> >>> of marc is valid ?
> >> >>>
> >> >>> =LDR 00638nam 2200181uu 4500
> >> >>> =001 cla-MldNA01
> >> >>> =008 080101s2008\\\\\\\|||||||||||||||||fre||
> >> >>> =040 \\$aMy Provider
> >> >>> =041 0\$afre
> >> >>> =245 10$aThis Subject
> >> >>> =260 \\$aParis$bJ. Doe$c2008
> >> >>> =490 \\$aSome topic
> >> >>> =650 1\$aNarratif, Autre forme
> >> >>> =655 \7$abook$2lcsh
> >> >>> =752 \\$aA Place on earth
> >> >>> =776 \\$dParis: John Doe and Cie, 1973
> >> >>> =856 \2$qtext/html
> >> >>> =856
> >> \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library
> >> >>>
> >> >>> Thanks,
> >> >>>
> >> >>> James L.
> >> >>>
> >> >>
> >> >>
> >>
> >
> >
> >
> > --
> > Bill Dueber
> > Library Systems Programmer
> > University of Michigan Library
> >
>
|