Jonathan Rochkind wrote:
> Ere Maijala wrote:
>>
>> That shouldn't be a problem as any sane OAI-PMH provider, unAPI or ATOM
>> serializer would escape the contents. Things that resemble HTML tags
>> could be present in MARC records without any HTML-in-MARC too.
>>
> Sure, and then, if you have html tags in your marc, that system doing
> the re-use is going to present content to users with escaped HTML in it,
> which isn't desirable either!
How the content is stored in the transport format is separate from how
it is used. Whatever the re-using system does is not related to how the
data was transferred to it. If it extracts the stuff from the XML, it
will of course unescape the content, but what happens after that is up
to the system and unrelated to the transport mechanism. So here is an
example of the whole process:
MARC with embedded HTML
->
OAI-PMH provider escapes the MARC in some XML format
->
OAI-PMH harvester (the re-using system) unescapes the data from the XML
format
->
Something is done with the data
It's the same as if the source system stores the data internally in
MARCXML. The content must be escaped so that it can be stored in MARCXML
and doesn't mess up the markup, but when the uses the data e.g. for
display, it's first retrieved from XML and unescaped, and massaged to
the desired display format only after that. If you use DOM to do the XML
manipulation, all this will happen automatically. You just write and
read strings and DOM manipulation takes care of escaping and unescaping.
You could substitute XML with e.g. Base64 encoding if it makes thinking
about this stuff easier. For instance email clients often send binary
files in Base64, but it doesn't mean the file is ruined, as the
receiving email client can decode it back to the original binary.
--Ere
|