I couldn't get Simon's MARC 21 Magic file to work. Among other issues, I received "line too long" errors. But, since I've been curious about this for sometime, I figured I'd take a whack at it myself. Try this:
#--------------------------------------------
# MARC 21 Magic (Second cut)
# Set at position 0
0 short >0x0000
# leader ends with 4500
>20 string 4500
# leader starts with 5 digits, followed by codes specific to MARC format
>>0 regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z] MARC Bibliographic
>>0 regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority
>>0 regex/1 (^[0-9]{5})[cdn][uvxy] MARC Holdings
>>0 regex/1 (^[0-9]{5})[acdn][w] MARC Classification
>>0 regex/1 (^[0-9]{5})[cdn][q] MARC Community
I've also attached it to this email to preserve the tabs.
In any event, I can confirm it works on MARC Bib, MARC Authority, and MARC Classification files I have bumping around my computer. I've not tested it on MARC Holdings and MARC Community.
Do let us/me know if it works for you (and the community generally). I can see about submitting it for formal inclusion in the magic file.
Warmly,
Kevin
--
Library of Congress
Network Development and MARC Standards Office
________________________________________
From: Code for Libraries [[log in to unmask]] On Behalf Of Simon Spero [[log in to unmask]]
Sent: Thursday, March 24, 2011 12:28
To: [log in to unmask]
Subject: Re: [CODE4LIB] MARC magic for file
Some of the problems in your first cut are:
1. Offsets for regex are given in terms of lines. MARC files don't have
newlines in them, unless you're Millennium, in which case they can be
inserted every 200,000 bytes to keep things interesting.
2. Byte matches match byte values, so "20 byte 4" is looking for the
binary value, not the ascii digit.
3. Sometimes you need to prime the buffer before you can do a regexp match.
Is this good enough?
# MARC 21 Magic (First cut)
# indicator count must be "2"
10 string 2
# leader must end in "4500"
>20 string 4500
# leader must start with five digits, a record status, and a record
type
>0 regex ^([0-9]{5})[acdnp][acdefgijkmoprt][abcims] MARC Bibliographic
>0 regex ^([0-9]{5})[acdnp][z] MARC Authority
Simon
On Wed, Mar 23, 2011 at 8:09 PM, William Denton <[log in to unmask]> wrote:
> Has anyone figured out the magic necessary for file to recognize MARC
> files?
>
> If you don't know it, file is a Unix command that tells you what kind of
> file a file is. For example:
>
> $ file 101015_001.mp3
> 101015_001.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS,
> layer III, v1, 192 kbps, 44.1 kHz, Stereo
>
> $ file P1000026.jpg
> P1000026.jpg: JPEG image data, EXIF standard, comment: "AppleMark"
>
> It's a really useful command. I assume it's on OSX, but I don't know. You
> can get it for Windows with Cygwin.
>
> The problem is, file doesn't grok MARC:
>
> $ file catalog.01.mrc
> catalog.01.mrc: data
>
> I took a stab at getting the magic defined, but it didn't work. I'll
> include what I used below. You can put it into a magic.txt file, and then
> use
>
> file -m magic.txt some_file.mrc
>
> to test it. It'll tell you the file is MARC Bibliographic ... but it also
> thinks that PDFs, JPEGs, and text files are MARC. That's no good.
>
> It'd be great if the MARC magic got into the central magic database so
> everyone would be able to recognize various MARC file types.
>
> Bill
>
>
> # --- clip'n'test
> # MARC 21 for Bibliographic Data
> # http://www.loc.gov/marc/bibliographic/bdleader.html
> #
> # This doesn't work properly
>
> 0 string x
>
>> 5 regex [acdnp]
>> 6 regex [acdefgijkmoprt]
>> 7 regex [abcims]
>> 8 regex [\ a]
>> 9 regex [\ a]
>> 10 byte x
>> 11 byte x
>> 12 string x
>> 17 regex [\ 12345678uz]
>> 18 regex [\ aciu]
>> 19 regex [\ abc] MARC Bibliographic
>>
> #>20 byte 4
> #>21 byte 5
> #>22 byte 0
> #>23 byte 0 MARC Bibliographic
>
> # --- end clip'n'test
>
> --
> William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org
>
|