Jon Gorman wrote:
>> You could substitute XML with e.g. Base64 encoding if it makes thinking
>> about this stuff easier. For instance email clients often send binary files
>> in Base64, but it doesn't mean the file is ruined, as the receiving email
>> client can decode it back to the original binary.
>
> A bit of an ironic statement, considering a regular, constant
> complaint on several library-related mailing lists I'm on is that
> emails are coming in "garbled" or need to be sent again in "plain
> text". Without fail it's because the person is using a client that
> won't or can't deal with Base64. Yes, silly this day and age.
Well, yeah, of course the receiving party could be broken, but there's
definitely all the information and rules it could use to decode the data.
> Perhaps I'm just jaded from working into libraries for too long but
> your examples assume some logical consistent control through the
> process of dealing with MARC data.
Not really, just some consistency in working with XML.
> while later they start raising alarms because of either: they see the
> markup in the record and wonder what's happening and how to remove it
> or one of those tools treats that area as text, another as xml
> content, and somewhere along the way it gets messed up.
I failed to make the statement that I definitely don't support adding
HTML to MARC records (unless at least a clean and backwards-compatible
way to do that is devised). My point was just that the transport
mechanism is irrelevant.
> database that was put there by the original MARC. The code is
> horribly setup and hackish and certain fields do not bother to escape
> what it's retrieving from the record. You then go to import to your
> new ILS, which validates the MARCXML. It of course now croaks because
> you have something like
> <marc:subfield code="a"><div class="foo">pretty</div>.
You're screwed then. You don't need embedded HTML to break it. For
instance a simple < ("less than") character in a field would do. This is
an example of very broken XML creation. I've done it myself too, but
fixed it asap to avoid more embarrassment. All this, however, is not a
useful argument in the HTML in MARC records debate.
> Would you count on having someone on the staff who will be able to fix
> those MARCXML files? Or did you have someone like that and they
> burned out? How long before the support contract on your old ILS
> forces you to abandon it?
Ok, this is drifting off-topic, but if you have a way to get the
original MARC records in ISO2709 format (or any other non-broken
format), it's a very simple task to convert them to valid MARCXML.
> I mainly sent out this email though because I don't think the folks
> who have been pointing out issues are confused. It's not that we
> don't understand that it should be able to "round-trip" or that we
> haven't played around with html in other data formats. I think we've
> used enough software in the library would to not trust all the layers
> will work as they should.
And my response was to Jonathan simply trying to state that encoding the
stuff in XML doesn't make a difference. I agree with you that there are
interoperability issues, but there are also examples in the library
world where stuff just works even when XML is involved, and the
receiving party can get the data intact.
--Ere
|