Thanks a lot, its exactly what was happening here, I was getting the first 856's only... Everything now rocks, and my "fix" was a mess up ;-) I didn't understand the behaviour of the API very well. Best Regards, James 2011/5/19 Ross Singer <[log in to unmask]> > On Thu, May 19, 2011 at 1:33 PM, Bill Dueber <[log in to unmask]> wrote: > > record['856'] is defined to return the *first* 856 in the record, which, > if > > you look at the documentation...er...ok. Which is not documented as such > in > > MARC::Record (http://rubydoc.info/gems/marc/0.4.2/MARC/Record) > > > > To get them all, you need to do something like > > > > sixfifties = record.fields '650' # returns array of results > > > > Or, to iterate > > > > record.each_by_tag('650') do |f| > > puts f['u'] if f['u'] # print out a URL if we've got one > > end > > > > What Bill said. Also, there's a somewhat complicated calculus that > comes into play here regarding ruby-marc and looking up subfields and > performance. > > Modern ruby-marc (which 0.4.2 is an example) has the capability of > providing a hash of the fields for much faster access than: > > eight_fifty_sixes = record.find_all { |field| field.tag == "856" } > > However, it comes at cost (that is, there's a penalty in building the > field map). This penalty is offset if you wind up doing a lot of > one-off lookups in a single record. If you're simply looking for a > single field in every record (or know, before hand, what fields you're > looking for), it's *much* faster to do something like: > > tags = ['001', '020', '100', '110', '111', '245', '650', '856'] > fields = record.find_all { | field | tags.include?(field.tag) } > > or whatever. At some point we did benchmark of this (Bill Dueber did > it: https://gist.github.com/591907) and the threshold was somewhere > around 6 or so #find_all calls were needed to offset building the > field map. > > This is why it's not really documented. This is the sort of thing > that really needs to go into the ruby-marc wiki. > > BTW, the behavior exists for subfields, too. If you do something like > record['043']['a'] and there are multiple subfield "a"s, you'll only > get the first one. > > -Ross. > > > > > > On Thu, May 19, 2011 at 1:16 PM, James Lecard <[log in to unmask]> > wrote: > > > >> I'll dig in this one, thanks for this input Jonathan... I'm not so so > >> familiar with the library yet, I'll do some more debugging but in fact > what > >> is happening is that I have no value with an access such as > >> record['856']['u'] field, while I get one for record['856']['q'] > >> And the marc you are seeing is copy/pasted from a marc editor gui, its > not > >> the actual marc record, I edited it so that its data is not recognisable > >> (for confidentiality). > >> > >> James > >> > >> > >> 2011/5/19 Jonathan Rochkind <[log in to unmask]> > >> > >> > Now whether it _means_ what you want it to mean is another question, > >> yeah. > >> > As Andreas said, I don't think that particular example _ought_ to have > >> two > >> > 856's. > >> > > >> > But it ought to be perfectly parseable marc. > >> > > >> > If your 'patch' is to make ruby-marc combine those multiple 856's into > >> one > >> > -- that is not right, two seperate 856's are two seperate 856's, same > as > >> any > >> > other marc field. Applying that patch would mess up ruby-marc, not fix > >> it. > >> > > >> > You need to be more specific about what you're doing and what you mean > >> > exactly by 'causing the ruby library to ignore it'. I wonder if you > are > >> > just using the a method in ruby-marc which only returns the first > field > >> > matching a given tag when there is more than one. > >> > > >> > > >> > > >> > > >> > On 5/19/2011 12:51 PM, Andreas Orphanides wrote: > >> > > >> >> From the MARC documentation [1]: > >> >> > >> >> "Field 856 is repeated when the location data elements vary (the URL > in > >> >> subfield $u or subfields $a, $b, $d, when used). It is also repeated > >> when > >> >> more than one access method is used, different portions of the item > are > >> >> available electronically, mirror sites are recorded, different > >> >> formats/resolutions with different URLs are indicated, and related > items > >> are > >> >> recorded." > >> >> > >> >> So it looks like however the URL is handled, a single 856 field > should > >> be > >> >> used to indicate the location [2]. I am not familiar enough with MARC > to > >> say > >> >> how it "should" have been done, but it looks like $q and $u would > >> probably > >> >> be sufficient (if they're in the same line). > >> >> > >> >> However, since the field is repeatable, the parser shouldn't be > choking > >> on > >> >> it, unless it's choking on it for a sophisticated reason (e.g., > "These > >> >> aren't the subfield tags I expect to be seeing"). It also looks like > if > >> $u > >> >> is provided, the first subfield should indicate access method (in > this > >> case > >> >> "4" for HTTP). Maybe that's what's causing the problem? [3] > >> >> > >> >> Anyway, I think having these two parts of the same URL data on > separate > >> >> lines is definitely Not Right, but I am not sure if it adds up to > >> invalid > >> >> MARC. > >> >> > >> >> -dre. > >> >> > >> >> [1] http://www.loc.gov/marc/bibliographic/bd856.html > >> >> [2] I am not a cataloger. Don't hurt me. > >> >> [3] I am not an expert on MARC ingest or on ruby-marc. I could be > wrong. > >> >> > >> >> On 5/19/2011 12:37 PM, James Lecard wrote: > >> >> > >> >>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I > >> get > >> >>> from a partner. > >> >>> > >> >>> The 856 field is splitted over 2 lines, causing the ruby library to > >> >>> ignore > >> >>> it (I've patched it to overcome this issue) but I want to know if > this > >> >>> kind > >> >>> of marc is valid ? > >> >>> > >> >>> =LDR 00638nam 2200181uu 4500 > >> >>> =001 cla-MldNA01 > >> >>> =008 080101s2008\\\\\\\|||||||||||||||||fre|| > >> >>> =040 \\$aMy Provider > >> >>> =041 0\$afre > >> >>> =245 10$aThis Subject > >> >>> =260 \\$aParis$bJ. Doe$c2008 > >> >>> =490 \\$aSome topic > >> >>> =650 1\$aNarratif, Autre forme > >> >>> =655 \7$abook$2lcsh > >> >>> =752 \\$aA Place on earth > >> >>> =776 \\$dParis: John Doe and Cie, 1973 > >> >>> =856 \2$qtext/html > >> >>> =856 > >> \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library > >> >>> > >> >>> Thanks, > >> >>> > >> >>> James L. > >> >>> > >> >> > >> >> > >> > > > > > > > > -- > > Bill Dueber > > Library Systems Programmer > > University of Michigan Library > > >