Print

Print


Jonathan Rochkind wrote:
> Ere Maijala wrote:
>>
>> That shouldn't be a problem as any sane OAI-PMH provider, unAPI or ATOM
>> serializer would escape the contents. Things that resemble HTML tags
>> could be present in MARC records without any HTML-in-MARC too.
>>   
> Sure, and then, if you have html tags in your marc, that system doing 
> the re-use is going to present content to users with escaped HTML in it, 
> which isn't desirable either!

How the content is stored in the transport format is separate from how 
it is used. Whatever the re-using system does is not related to how the 
data was transferred to it. If it extracts the stuff from the XML, it 
will of course unescape the content, but what happens after that is up 
to the system and unrelated to the transport mechanism. So here is an 
example of the whole process:

MARC with embedded HTML
->
OAI-PMH provider escapes the MARC in some XML format
->
OAI-PMH harvester (the re-using system) unescapes the data from the XML 
format
->
Something is done with the data

It's the same as if the source system stores the data internally in 
MARCXML. The content must be escaped so that it can be stored in MARCXML 
and doesn't mess up the markup, but when the uses the data e.g. for 
display, it's first retrieved from XML and unescaped, and massaged to 
the desired display format only after that. If you use DOM to do the XML 
manipulation, all this will happen automatically. You just write and 
read strings and DOM manipulation takes care of escaping and unescaping.

You could substitute XML with e.g. Base64 encoding if it makes thinking 
about this stuff easier. For instance email clients often send binary 
files in Base64, but it doesn't mean the file is ruined, as the 
receiving email client can decode it back to the original binary.

--Ere