I agree with Ere that XML isn't the real issue here, in understanding
why embedding HTML in MARC is nevertheless something to be avoided if at
all possible. :)
Ere Maijala wrote:
> Jon Gorman wrote:
>>> You could substitute XML with e.g. Base64 encoding if it makes thinking
>>> about this stuff easier. For instance email clients often send binary files
>>> in Base64, but it doesn't mean the file is ruined, as the receiving email
>>> client can decode it back to the original binary.
>> A bit of an ironic statement, considering a regular, constant
>> complaint on several library-related mailing lists I'm on is that
>> emails are coming in "garbled" or need to be sent again in "plain
>> text". Without fail it's because the person is using a client that
>> won't or can't deal with Base64. Yes, silly this day and age.
> Well, yeah, of course the receiving party could be broken, but there's
> definitely all the information and rules it could use to decode the data.
>> Perhaps I'm just jaded from working into libraries for too long but
>> your examples assume some logical consistent control through the
>> process of dealing with MARC data.
> Not really, just some consistency in working with XML.
>> while later they start raising alarms because of either: they see the
>> markup in the record and wonder what's happening and how to remove it
>> or one of those tools treats that area as text, another as xml
>> content, and somewhere along the way it gets messed up.
> I failed to make the statement that I definitely don't support adding
> HTML to MARC records (unless at least a clean and backwards-compatible
> way to do that is devised). My point was just that the transport
> mechanism is irrelevant.
>> database that was put there by the original MARC. The code is
>> horribly setup and hackish and certain fields do not bother to escape
>> what it's retrieving from the record. You then go to import to your
>> new ILS, which validates the MARCXML. It of course now croaks because
>> you have something like
>> <marc:subfield code="a"><div class="foo">pretty</div>.
> You're screwed then. You don't need embedded HTML to break it. For
> instance a simple < ("less than") character in a field would do. This is
> an example of very broken XML creation. I've done it myself too, but
> fixed it asap to avoid more embarrassment. All this, however, is not a
> useful argument in the HTML in MARC records debate.
>> Would you count on having someone on the staff who will be able to fix
>> those MARCXML files? Or did you have someone like that and they
>> burned out? How long before the support contract on your old ILS
>> forces you to abandon it?
> Ok, this is drifting off-topic, but if you have a way to get the
> original MARC records in ISO2709 format (or any other non-broken
> format), it's a very simple task to convert them to valid MARCXML.
>> I mainly sent out this email though because I don't think the folks
>> who have been pointing out issues are confused. It's not that we
>> don't understand that it should be able to "round-trip" or that we
>> haven't played around with html in other data formats. I think we've
>> used enough software in the library would to not trust all the layers
>> will work as they should.
> And my response was to Jonathan simply trying to state that encoding the
> stuff in XML doesn't make a difference. I agree with you that there are
> interoperability issues, but there are also examples in the library
> world where stuff just works even when XML is involved, and the
> receiving party can get the data intact.