I am pretty sure that the marc4j standard reader ignores them; the tolerant reader definitely does. Otherwise JHU might have about two parseable records based on the mangled leaders that J-Rock gets stuck with :-) An analysis of the ~7M LC bib records from the scriblio.net data files (~ Dec 2006) indicated that leader has less than 8 bits of information in it (shannon-weaver definition). This excludes the initial length value, which is redundant given the end of record marker. The LC V'GER adds a pseudo tag 000 to it's HTML view of the MARC leader. The final characters of the leader are "450". Also, I object to the phrase "decent MARC tool". Any tool capable of dealing with MARC as it exists cannot afford the luxury of decency :-) [ HA: "A clear conscience?" BW: "Yes, Sir Humphrey." HA: "When did you acquire this taste for luxuries?"] Simon On Fri, Apr 1, 2011 at 5:16 AM, Owen Stephens <[log in to unmask]> wrote: > "I'm sure any decent MARC tool can deal with them, since decent MARC tools > are certainly going to be forgiving enough to deal with four characters > that > apparently don't even really matter." > > You say that, but I'm pretty sure Marc4J throws errors MARC records where > these characters are incorrect > > Owen > > On Fri, Apr 1, 2011 at 3:51 AM, William Denton <[log in to unmask]> wrote: > > > On 28 March 2011, Ford, Kevin wrote: > > > > I couldn't get Simon's MARC 21 Magic file to work. Among other issues, > I > >> received "line too long" errors. But, since I've been curious about > this > >> for sometime, I figured I'd take a whack at it myself. Try this: > >> > > > > This is very nice! Thanks. I tried it on a bunch of MARC files I have, > > and it recognized almost all of them. A few it didn't, so I had a closer > > look, and they're invalid. > > > > For example, the Internet Archive's Binghamton catalogue dump: > > > > http://ia600307.us.archive.org/6/items/marc_binghamton_univ/ > > > > $ file -m marc.magic bgm*mrc > > bgm_openlib_final_0-5.mrc: data > > bgm_openlib_final_10-15.mrc: MARC Bibliographic > > bgm_openlib_final_15-18.mrc: data > > bgm_openlib_final_5-10.mrc: MARC Bibliographic > > > > But why? Aha: > > > > $ head -c 25 bgm_openlib_final_*mrc > > ==> bgm_openlib_final_0-5.mrc <== > > 01812cas 2200457 45x00 > > ==> bgm_openlib_final_10-15.mrc <== > > 01008nam 2200289ua 45000 > > ==> bgm_openlib_final_15-18.mrc <== > > 01614cam 00385 45 0 > > ==> bgm_openlib_final_5-10.mrc <== > > 00887nam 2200265v 45000 > > > > As you say, the leader should end with 4500 (as defined at > > http://www.loc.gov/marc/authority/adleader.html) but two of those files > > don't. So they're not valid MARC. I'm sure any decent MARC tool can > deal > > with them, since decent MARC tools are certainly going to be forgiving > > enough to deal with four characters that apparently don't even really > > matter. > > > > So on the one hand they're usable MARC but file wouldn't say so, and on > the > > other that's a good indication that the files have failed a basic > validity > > test. I wonder if there are similar situations for JPEGs or MP3s. > > > > I think you should definitely submit this for inclusion in the magic > file. > > It would be very useful for us all! > > > > Bill > > > > P.S. I'd never used head -c (to show a fixed number of bytes) before. > > Always nice to find a new useful option to an old command. > > > > > > #-------------------------------------------- > >> # MARC 21 Magic (Second cut) > >> > >> # Set at position 0 > >> 0 short >0x0000 > >> > >> # leader ends with 4500 > >> > >>> 20 string 4500 > >>> > >> > >> # leader starts with 5 digits, followed by codes specific to MARC format > >> > >>> 0 regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z] MARC Bibliographic > >>>> 0 regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority > >>>> 0 regex/1 (^[0-9]{5})[cdn][uvxy] MARC Holdings > >>>> 0 regex/1 (^[0-9]{5})[acdn][w] MARC Classification > >>>> 0 regex/1 (^[0-9]{5})[cdn][q] MARC Community > >>>> > >>> > > > > -- > > William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org > > > > > > -- > Owen Stephens > Owen Stephens Consulting > Web: http://www.ostephens.com > Email: [log in to unmask] >