On Fri, Jun 29, 2012 at 9:51 AM, Sullivan, Mark V <[log in to unmask]>wrote: > I received a question regarding a software library I have created and > released as open source. The record length in the leader ( positions 0-4 ) > was not being calculated correctly when writing as MarcXML. However, this > raises a more philosophical and larger question. What is the point of the > first five digits of the leader, outside of a ISO2709 / MARC21 encoded > record? Should I calculate the record length AS IF it would be encoded in > ISO2709? This would be computationally non-trivial and would likely double > the time necessary for my software to write a MarcXML file. Should I just > make the first five digits of the leader '00000', since it means nothing in > the context of a MarcXML file? > All of the essential data in a MARC record is converted and expressed in XML. MARC structrual elements, such as the length of field and starting position of field data in directory entries are not needed in the XML record. *Leader data positions not needed in the XML environment are retained as place holders or carried as blanks*. http://www.loc.gov/standards/marcxml/marcxml-design.html In the XSD, The record length, and the base address of the data, must match the regex "[\d ]{5}". When generating a MARCXML record, if there is no length to be preserved, or if you don't feel like preserving it, you should use a sequence of blanks for the length and for the offset. The purpose of the length and offset fields in the marcxml leader is to take up 10 bytes of space, since otherwise the format would be far too compact. [Fun fact: for a sample of ~7 million LC bibliographic records, the 19 bytes of the marc leader excluding the length field had an information content of about 7 bits. ] Simon