"I'm sure any decent MARC tool can deal with them, since decent MARC tools
are certainly going to be forgiving enough to deal with four characters that
apparently don't even really matter."
You say that, but I'm pretty sure Marc4J throws errors MARC records where
these characters are incorrect
Owen
On Fri, Apr 1, 2011 at 3:51 AM, William Denton <[log in to unmask]> wrote:
> On 28 March 2011, Ford, Kevin wrote:
>
> I couldn't get Simon's MARC 21 Magic file to work. Among other issues, I
>> received "line too long" errors. But, since I've been curious about this
>> for sometime, I figured I'd take a whack at it myself. Try this:
>>
>
> This is very nice! Thanks. I tried it on a bunch of MARC files I have,
> and it recognized almost all of them. A few it didn't, so I had a closer
> look, and they're invalid.
>
> For example, the Internet Archive's Binghamton catalogue dump:
>
> http://ia600307.us.archive.org/6/items/marc_binghamton_univ/
>
> $ file -m marc.magic bgm*mrc
> bgm_openlib_final_0-5.mrc: data
> bgm_openlib_final_10-15.mrc: MARC Bibliographic
> bgm_openlib_final_15-18.mrc: data
> bgm_openlib_final_5-10.mrc: MARC Bibliographic
>
> But why? Aha:
>
> $ head -c 25 bgm_openlib_final_*mrc
> ==> bgm_openlib_final_0-5.mrc <==
> 01812cas 2200457 45x00
> ==> bgm_openlib_final_10-15.mrc <==
> 01008nam 2200289ua 45000
> ==> bgm_openlib_final_15-18.mrc <==
> 01614cam 00385 45 0
> ==> bgm_openlib_final_5-10.mrc <==
> 00887nam 2200265v 45000
>
> As you say, the leader should end with 4500 (as defined at
> http://www.loc.gov/marc/authority/adleader.html) but two of those files
> don't. So they're not valid MARC. I'm sure any decent MARC tool can deal
> with them, since decent MARC tools are certainly going to be forgiving
> enough to deal with four characters that apparently don't even really
> matter.
>
> So on the one hand they're usable MARC but file wouldn't say so, and on the
> other that's a good indication that the files have failed a basic validity
> test. I wonder if there are similar situations for JPEGs or MP3s.
>
> I think you should definitely submit this for inclusion in the magic file.
> It would be very useful for us all!
>
> Bill
>
> P.S. I'd never used head -c (to show a fixed number of bytes) before.
> Always nice to find a new useful option to an old command.
>
>
> #--------------------------------------------
>> # MARC 21 Magic (Second cut)
>>
>> # Set at position 0
>> 0 short >0x0000
>>
>> # leader ends with 4500
>>
>>> 20 string 4500
>>>
>>
>> # leader starts with 5 digits, followed by codes specific to MARC format
>>
>>> 0 regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z] MARC Bibliographic
>>>> 0 regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority
>>>> 0 regex/1 (^[0-9]{5})[cdn][uvxy] MARC Holdings
>>>> 0 regex/1 (^[0-9]{5})[acdn][w] MARC Classification
>>>> 0 regex/1 (^[0-9]{5})[cdn][q] MARC Community
>>>>
>>>
>
> --
> William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org
>
--
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: [log in to unmask]
|