I'm not a big user of MARCXML, but I can think of a few reasons off the top of my head:
- Existing libraries for reading, manipulating and searching XML-based documents are very mature.
- Documents can be validated for their "well-formedness" using these existing tools and a pre-defined schema (a validator for MARC would need to be custom-coded)
- MARCXML can easily be incorporated into XML-based meta-metadata schemas, like METS.
- It can be parsed and manipulated in a web service context without sending a binary blob over the wire.
- XML is self-describing, binary is not.
There's nothing stopping you from reading the MARCXML into a binary blob and working on it from there. But when sharing documents from different institutions around the globe, using a wide variety of tools and techniques, XML seems to be the lowest common denominator.
On 2010-10-25, at 2:38 PM, Nate Vack wrote:
> Hi all,
> I've just spent the last couple of weeks delving into and decoding a
> binary file format. This, in turn, got me thinking about MARCXML.
> In a nutshell, it looks like it's supposed to contain the exact same
> data as a normal MARC record, except in XML form. As in, it should be
> What's the advantage to this? I can see using a human-readable format
> for poorly-documented file formats -- they're relatively easy to read
> and understand. But MARC is well, well-documented, with more than one
> free implementation in cursory searching. And once you know a binary
> file's format, it's no harder to parse than XML, and the data's
> smaller and processing faster.
> So... why the XML?