Print

Print


Sounds like a good plan to me.

Best regards,

Jason Bengtson, MLIS, MA
Head of Library Computing and Information Systems
Assistant Professor, Graduate College
Department of Health Sciences Library and Information Management
University of Oklahoma Health Sciences Center
405-271-2285, opt. 5
405-271-3297 (fax)
[log in to unmask]
http://library.ouhsc.edu
www.jasonbengtson.com

NOTICE:
This e-mail is intended solely for the use of the individual to whom it is addressed and may contain information that is privileged, confidential or otherwise exempt from disclosure. If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify us by replying to the original message at the listed email address. Thank You.

On Dec 9, 2013, at 7:48 PM, Roy Tennant <[log in to unmask]> wrote:

> For my money, the text transform should look only for exact matches (e.g.,
> "&aacute;", "&nbsp;", "&copy;") and replace them with their numeric
> counterparts.
> Roy
> 
> 
> On Mon, Dec 9, 2013 at 5:41 PM, jason bengtson <[log in to unmask]>wrote:
> 
>> For testing purposes I just nixed them. As I noted, to rework the file a
>> person would probably want to use a more critical eye with find and
>> replace. Totally doable.
>> 
>> 
>> On Dec 9, 2013, at 7:37 PM, Jon Gorman <[log in to unmask]> wrote:
>> 
>>> How did you fix the ampersands? I ask, because if you just did a simple
>>> text transform from & to &amp;, it would mask the problem of the entity
>>> escaping I think...
>>> 
>>> Not at work, so I don't have a good example and the file is downloading
>>> very slowly here, so I'll try to do one from memory.
>>> 
>>> There were several &aacute; in the XML which mapped to an accent
>> character
>>> in the DTD via the Entity.
>>> 
>>> If you just substituted & with &amp;, you'd get &amp;aacute;, which would
>>> render inline as &accute;. It would superficially solve the issue since
>>> browsers would no longer give the errors about the dtd since it wouldn't
>> be
>>> trying to load entities from the DTDs. And depending how you did it, you
>>> likely could also replace a correctly encoded one to make &amp;amp;,
>>> leading to some very odd stuff.
>>> 
>>> I wouldn't be surprised to find some unescaped ampersands, but the
>> solution
>>> I posted will essentially replace the entities with their text, hopefully
>>> causing most characters to appear correctly. You definitely still need to
>>> fix some of the other stuff. (I suspect it never worked for most browsers
>>> and XML systems, most likely only IE).
>>> 
>>> Jon Gorman
>>> University of Illinois
>> 
>> Best regards,
>> 
>> Jason Bengtson, MLIS, MA
>> Head of Library Computing and Information SystemsAssistant Professor,
>> Graduate CollegeDepartment of Health Sciences Library and Information
>> ManagementUniversity of Oklahoma Health Sciences Center405-271-2285, opt.
>> 5405-271-3297 (fax)
>> [log in to unmask]
>> http://library.ouhsc.edu
>> www.jasonbengtson.com
>> 
>> NOTICE:
>> This e-mail is intended solely for the use of the individual to whom it is
>> addressed and may contain information that is privileged, confidential or
>> otherwise exempt from disclosure. If the reader of this e-mail is not the
>> intended recipient or the employee or agent responsible for delivering the
>> message to the intended recipient, you are hereby notified that any
>> dissemination, distribution, or copying of this communication is strictly
>> prohibited. If you have received this communication in error, please
>> immediately notify us by replying to the original message at the listed
>> email address. Thank You.
>>