> You could substitute XML with e.g. Base64 encoding if it makes thinking > about this stuff easier. For instance email clients often send binary files > in Base64, but it doesn't mean the file is ruined, as the receiving email > client can decode it back to the original binary. A bit of an ironic statement, considering a regular, constant complaint on several library-related mailing lists I'm on is that emails are coming in "garbled" or need to be sent again in "plain text". Without fail it's because the person is using a client that won't or can't deal with Base64. Yes, silly this day and age. Perhaps I'm just jaded from working into libraries for too long but your examples assume some logical consistent control through the process of dealing with MARC data. Let's think of this scenario instead: You're using your vendor's ILS system. You stick some html tags into a record. The vendor's ILS does some different stuff with it like indexing it, storing the complete record for later retrieval, and pulling data in the record into a semi-normalized scheme in a database. Now the librarians that have just enough training to do some reports for these systems start running them via access and start shifting the data around in Access, Exce,l and Word. Then a little while later they start raising alarms because of either: they see the markup in the record and wonder what's happening and how to remove it or one of those tools treats that area as text, another as xml content, and somewhere along the way it gets messed up. The above is not really all that uncommon of a scenario. Or how about this scenario: You add some html internally to a MARC record. You then add it to your ILS system. A few years later you go to export and decide to do it in MARCXML. Unknown to you, the ILS doesn't do a sane translation process, but rather rebuilds the MARCXML from information in the database that was put there by the original MARC. The code is horribly setup and hackish and certain fields do not bother to escape what it's retrieving from the record. You then go to import to your new ILS, which validates the MARCXML. It of course now croaks because you have something like <marc:subfield code="a"><div class="foo">pretty</div>. Would you count on having someone on the staff who will be able to fix those MARCXML files? Or did you have someone like that and they burned out? How long before the support contract on your old ILS forces you to abandon it? Plus the fact there's still unresolved questions. Let's take RSS as an example as a format that has been abused by html in the past. If you find "html" in RSS, you can't be sure if it valid or well-formed. Frequently there's no way to know what version of html it is. Yes, in the end you can just throw it at a html parser or a browser and hope for the best, but we have to consider the input mechanisms here. Are folks going to be entering the html by hand? Are there going to be some sort of macros? Some sort of batch change process? Each have different level of risks for having bad html. How much extra processing are you going to want to do for each record each time you might end up displaying it? What to do with a mistake? Let your parser determine or their browser? It's great to say we should simply re-write our tools, but many of us work with tools supplied by vendors. We may be trying to move to more open tools and the like but ultimately we're constrained by what our upper managements dictate. There's both practical reasons (untrustworthy systems) and more abstract reasons (how to we communicate which version? namespaces? etc) issues at play here. Ultimately I do agree that if it could not be avoided to try putting in the html into the record itself. A gain of better usability and functionality over a couple of years is probably worth it as the chance of a large issue later on is quite small. (Higher chance of small issues though). I mainly sent out this email though because I don't think the folks who have been pointing out issues are confused. It's not that we don't understand that it should be able to "round-trip" or that we haven't played around with html in other data formats. I think we've used enough software in the library would to not trust all the layers will work as they should. Jon Gorman