Jonathan Rochkind wrote: > Ere Maijala wrote: >> >> That shouldn't be a problem as any sane OAI-PMH provider, unAPI or ATOM >> serializer would escape the contents. Things that resemble HTML tags >> could be present in MARC records without any HTML-in-MARC too. >> > Sure, and then, if you have html tags in your marc, that system doing > the re-use is going to present content to users with escaped HTML in it, > which isn't desirable either! How the content is stored in the transport format is separate from how it is used. Whatever the re-using system does is not related to how the data was transferred to it. If it extracts the stuff from the XML, it will of course unescape the content, but what happens after that is up to the system and unrelated to the transport mechanism. So here is an example of the whole process: MARC with embedded HTML -> OAI-PMH provider escapes the MARC in some XML format -> OAI-PMH harvester (the re-using system) unescapes the data from the XML format -> Something is done with the data It's the same as if the source system stores the data internally in MARCXML. The content must be escaped so that it can be stored in MARCXML and doesn't mess up the markup, but when the uses the data e.g. for display, it's first retrieved from XML and unescaped, and massaged to the desired display format only after that. If you use DOM to do the XML manipulation, all this will happen automatically. You just write and read strings and DOM manipulation takes care of escaping and unescaping. You could substitute XML with e.g. Base64 encoding if it makes thinking about this stuff easier. For instance email clients often send binary files in Base64, but it doesn't mean the file is ruined, as the receiving email client can decode it back to the original binary. --Ere