I guess what I meant is that in MARCXML, you have a <datafield> element with subsequent <subfield> elements each with fairly clear attributes, which, while not my idea of fun Sunday-afternoon reading, requires less specialized tools to parse (hello Textmate!) and is a bit easier than trying to count INT positions. One quick XPath query and you can have all 245 fields, regardless of their length or position in the record.
On 2010-10-25, at 3:26 PM, Nate Vack wrote:
> On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding <[log in to unmask]> wrote:
>> - XML is self-describing, binary is not.
>>
>> Not to quibble, but that's only in a theoretical sense here. Something
>> like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
>> At least MARC records kinda imitate catalog cards.
>
> Yeah -- this is kinda the source of my confusion. In the case of the
> files I'm reading, it's not that it's hard to find out where the
> nMeasurement field lives (it's six short ints starting at offset 64),
> but what the field means, and whether or not I care about it.
>
> Switching to an XML format doesn't help with that at all.
>
> WRT character encoding issues and validation: if MARC and MARCXML are
> round-trippable, a solution in one environment is equivalent to a
> solution in the other.
>
> And I think we've all seen plenty of unvalidated, badly-formed XML,
> and plenty with Character Encoding Problems™ ;-)
>
> Thanks for the input!
> -Nate
|