I agree with Ere that XML isn't the real issue here, in understanding why embedding HTML in MARC is nevertheless something to be avoided if at all possible. :) Ere Maijala wrote: > Jon Gorman wrote: > >>> You could substitute XML with e.g. Base64 encoding if it makes thinking >>> about this stuff easier. For instance email clients often send binary files >>> in Base64, but it doesn't mean the file is ruined, as the receiving email >>> client can decode it back to the original binary. >>> >> A bit of an ironic statement, considering a regular, constant >> complaint on several library-related mailing lists I'm on is that >> emails are coming in "garbled" or need to be sent again in "plain >> text". Without fail it's because the person is using a client that >> won't or can't deal with Base64. Yes, silly this day and age. >> > > Well, yeah, of course the receiving party could be broken, but there's > definitely all the information and rules it could use to decode the data. > > >> Perhaps I'm just jaded from working into libraries for too long but >> your examples assume some logical consistent control through the >> process of dealing with MARC data. >> > > Not really, just some consistency in working with XML. > > >> while later they start raising alarms because of either: they see the >> markup in the record and wonder what's happening and how to remove it >> or one of those tools treats that area as text, another as xml >> content, and somewhere along the way it gets messed up. >> > > I failed to make the statement that I definitely don't support adding > HTML to MARC records (unless at least a clean and backwards-compatible > way to do that is devised). My point was just that the transport > mechanism is irrelevant. > > >> database that was put there by the original MARC. The code is >> horribly setup and hackish and certain fields do not bother to escape >> what it's retrieving from the record. You then go to import to your >> new ILS, which validates the MARCXML. It of course now croaks because >> you have something like >> <marc:subfield code="a"><div class="foo">pretty</div>. >> > > You're screwed then. You don't need embedded HTML to break it. For > instance a simple < ("less than") character in a field would do. This is > an example of very broken XML creation. I've done it myself too, but > fixed it asap to avoid more embarrassment. All this, however, is not a > useful argument in the HTML in MARC records debate. > > >> Would you count on having someone on the staff who will be able to fix >> those MARCXML files? Or did you have someone like that and they >> burned out? How long before the support contract on your old ILS >> forces you to abandon it? >> > > Ok, this is drifting off-topic, but if you have a way to get the > original MARC records in ISO2709 format (or any other non-broken > format), it's a very simple task to convert them to valid MARCXML. > > >> I mainly sent out this email though because I don't think the folks >> who have been pointing out issues are confused. It's not that we >> don't understand that it should be able to "round-trip" or that we >> haven't played around with html in other data formats. I think we've >> used enough software in the library would to not trust all the layers >> will work as they should. >> > > And my response was to Jonathan simply trying to state that encoding the > stuff in XML doesn't make a difference. I agree with you that there are > interoperability issues, but there are also examples in the library > world where stuff just works even when XML is involved, and the > receiving party can get the data intact. > > --Ere > >