Print

Print


Thanks all!  Yes, I was expecting to need to replace those text strings with the numeric entities

Wendy

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Roy Tennant
Sent: Monday, December 09, 2013 7:48 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] problem in old etd xml files

For my money, the text transform should look only for exact matches (e.g., "á", " ", "©") and replace them with their numeric counterparts.
Roy


On Mon, Dec 9, 2013 at 5:41 PM, jason bengtson <[log in to unmask]>wrote:

> For testing purposes I just nixed them. As I noted, to rework the file 
> a person would probably want to use a more critical eye with find and 
> replace. Totally doable.
>
>
> On Dec 9, 2013, at 7:37 PM, Jon Gorman <[log in to unmask]> wrote:
>
> > How did you fix the ampersands? I ask, because if you just did a 
> > simple text transform from & to &amp;, it would mask the problem of 
> > the entity escaping I think...
> >
> > Not at work, so I don't have a good example and the file is 
> > downloading very slowly here, so I'll try to do one from memory.
> >
> > There were several &aacute; in the XML which mapped to an accent
> character
> > in the DTD via the Entity.
> >
> > If you just substituted & with &amp;, you'd get &amp;aacute;, which 
> > would render inline as &accute;. It would superficially solve the 
> > issue since browsers would no longer give the errors about the dtd 
> > since it wouldn't
> be
> > trying to load entities from the DTDs. And depending how you did it, 
> > you likely could also replace a correctly encoded one to make 
> > &amp;amp;, leading to some very odd stuff.
> >
> > I wouldn't be surprised to find some unescaped ampersands, but the
> solution
> > I posted will essentially replace the entities with their text, 
> > hopefully causing most characters to appear correctly. You 
> > definitely still need to fix some of the other stuff. (I suspect it 
> > never worked for most browsers and XML systems, most likely only IE).
> >
> > Jon Gorman
> > University of Illinois
>
> Best regards,
>
> Jason Bengtson, MLIS, MA
> Head of Library Computing and Information SystemsAssistant Professor, 
> Graduate CollegeDepartment of Health Sciences Library and Information 
> ManagementUniversity of Oklahoma Health Sciences Center405-271-2285, opt.
> 5405-271-3297 (fax)
> [log in to unmask]
> http://library.ouhsc.edu
> www.jasonbengtson.com
>
> NOTICE:
> This e-mail is intended solely for the use of the individual to whom 
> it is addressed and may contain information that is privileged, 
> confidential or otherwise exempt from disclosure. If the reader of 
> this e-mail is not the intended recipient or the employee or agent 
> responsible for delivering the message to the intended recipient, you 
> are hereby notified that any dissemination, distribution, or copying 
> of this communication is strictly prohibited. If you have received 
> this communication in error, please immediately notify us by replying 
> to the original message at the listed email address. Thank You.
>