I am pretty sure that the marc4j standard reader ignores them; the tolerant
reader definitely does. Otherwise JHU might have about two parseable records
based on the mangled leaders that J-Rock gets stuck with :-)
An analysis of the ~7M LC bib records from the scriblio.net data files (~
Dec 2006) indicated that leader has less than 8 bits of information in it
(shannon-weaver definition). This excludes the initial length value, which
is redundant given the end of record marker.
The LC V'GER adds a pseudo tag 000 to it's HTML view of the MARC leader.
The final characters of the leader are "450".
Also, I object to the phrase "decent MARC tool". Any tool capable of
dealing with MARC as it exists cannot afford the luxury of decency :-)
[ HA: "A clear conscience?"
BW: "Yes, Sir Humphrey."
HA: "When did you acquire this taste for luxuries?"]
Simon
On Fri, Apr 1, 2011 at 5:16 AM, Owen Stephens <[log in to unmask]> wrote:
> "I'm sure any decent MARC tool can deal with them, since decent MARC tools
> are certainly going to be forgiving enough to deal with four characters
> that
> apparently don't even really matter."
>
> You say that, but I'm pretty sure Marc4J throws errors MARC records where
> these characters are incorrect
>
> Owen
>
> On Fri, Apr 1, 2011 at 3:51 AM, William Denton <[log in to unmask]> wrote:
>
> > On 28 March 2011, Ford, Kevin wrote:
> >
> > I couldn't get Simon's MARC 21 Magic file to work. Among other issues,
> I
> >> received "line too long" errors. But, since I've been curious about
> this
> >> for sometime, I figured I'd take a whack at it myself. Try this:
> >>
> >
> > This is very nice! Thanks. I tried it on a bunch of MARC files I have,
> > and it recognized almost all of them. A few it didn't, so I had a closer
> > look, and they're invalid.
> >
> > For example, the Internet Archive's Binghamton catalogue dump:
> >
> > http://ia600307.us.archive.org/6/items/marc_binghamton_univ/
> >
> > $ file -m marc.magic bgm*mrc
> > bgm_openlib_final_0-5.mrc: data
> > bgm_openlib_final_10-15.mrc: MARC Bibliographic
> > bgm_openlib_final_15-18.mrc: data
> > bgm_openlib_final_5-10.mrc: MARC Bibliographic
> >
> > But why? Aha:
> >
> > $ head -c 25 bgm_openlib_final_*mrc
> > ==> bgm_openlib_final_0-5.mrc <==
> > 01812cas 2200457 45x00
> > ==> bgm_openlib_final_10-15.mrc <==
> > 01008nam 2200289ua 45000
> > ==> bgm_openlib_final_15-18.mrc <==
> > 01614cam 00385 45 0
> > ==> bgm_openlib_final_5-10.mrc <==
> > 00887nam 2200265v 45000
> >
> > As you say, the leader should end with 4500 (as defined at
> > http://www.loc.gov/marc/authority/adleader.html) but two of those files
> > don't. So they're not valid MARC. I'm sure any decent MARC tool can
> deal
> > with them, since decent MARC tools are certainly going to be forgiving
> > enough to deal with four characters that apparently don't even really
> > matter.
> >
> > So on the one hand they're usable MARC but file wouldn't say so, and on
> the
> > other that's a good indication that the files have failed a basic
> validity
> > test. I wonder if there are similar situations for JPEGs or MP3s.
> >
> > I think you should definitely submit this for inclusion in the magic
> file.
> > It would be very useful for us all!
> >
> > Bill
> >
> > P.S. I'd never used head -c (to show a fixed number of bytes) before.
> > Always nice to find a new useful option to an old command.
> >
> >
> > #--------------------------------------------
> >> # MARC 21 Magic (Second cut)
> >>
> >> # Set at position 0
> >> 0 short >0x0000
> >>
> >> # leader ends with 4500
> >>
> >>> 20 string 4500
> >>>
> >>
> >> # leader starts with 5 digits, followed by codes specific to MARC format
> >>
> >>> 0 regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z] MARC Bibliographic
> >>>> 0 regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority
> >>>> 0 regex/1 (^[0-9]{5})[cdn][uvxy] MARC Holdings
> >>>> 0 regex/1 (^[0-9]{5})[acdn][w] MARC Classification
> >>>> 0 regex/1 (^[0-9]{5})[cdn][q] MARC Community
> >>>>
> >>>
> >
> > --
> > William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org
> >
>
>
>
> --
> Owen Stephens
> Owen Stephens Consulting
> Web: http://www.ostephens.com
> Email: [log in to unmask]
>
|