I guess what I meant is that in MARCXML, you have a <datafield> element with subsequent <subfield> elements each with fairly clear attributes, which, while not my idea of fun Sunday-afternoon reading, requires less specialized tools to parse (hello Textmate!) and is a bit easier than trying to count INT positions. One quick XPath query and you can have all 245 fields, regardless of their length or position in the record. On 2010-10-25, at 3:26 PM, Nate Vack wrote: > On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding <[log in to unmask]> wrote: >> - XML is self-describing, binary is not. >> >> Not to quibble, but that's only in a theoretical sense here. Something >> like Amazon XML is truly self-describing. MARCXML is self-obfuscating. >> At least MARC records kinda imitate catalog cards. > > Yeah -- this is kinda the source of my confusion. In the case of the > files I'm reading, it's not that it's hard to find out where the > nMeasurement field lives (it's six short ints starting at offset 64), > but what the field means, and whether or not I care about it. > > Switching to an XML format doesn't help with that at all. > > WRT character encoding issues and validation: if MARC and MARCXML are > round-trippable, a solution in one environment is equivalent to a > solution in the other. > > And I think we've all seen plenty of unvalidated, badly-formed XML, > and plenty with Character Encoding Problems™ ;-) > > Thanks for the input! > -Nate