Thanks all! Yes, I was expecting to need to replace those text strings with the numeric entities
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Roy Tennant
Sent: Monday, December 09, 2013 7:48 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] problem in old etd xml files
For my money, the text transform should look only for exact matches (e.g., "á", " ", "©") and replace them with their numeric counterparts.
On Mon, Dec 9, 2013 at 5:41 PM, jason bengtson <[log in to unmask]>wrote:
> For testing purposes I just nixed them. As I noted, to rework the file
> a person would probably want to use a more critical eye with find and
> replace. Totally doable.
> On Dec 9, 2013, at 7:37 PM, Jon Gorman <[log in to unmask]> wrote:
> > How did you fix the ampersands? I ask, because if you just did a
> > simple text transform from & to &, it would mask the problem of
> > the entity escaping I think...
> > Not at work, so I don't have a good example and the file is
> > downloading very slowly here, so I'll try to do one from memory.
> > There were several á in the XML which mapped to an accent
> > in the DTD via the Entity.
> > If you just substituted & with &, you'd get &aacute;, which
> > would render inline as &accute;. It would superficially solve the
> > issue since browsers would no longer give the errors about the dtd
> > since it wouldn't
> > trying to load entities from the DTDs. And depending how you did it,
> > you likely could also replace a correctly encoded one to make
> > &amp;, leading to some very odd stuff.
> > I wouldn't be surprised to find some unescaped ampersands, but the
> > I posted will essentially replace the entities with their text,
> > hopefully causing most characters to appear correctly. You
> > definitely still need to fix some of the other stuff. (I suspect it
> > never worked for most browsers and XML systems, most likely only IE).
> > Jon Gorman
> > University of Illinois
> Best regards,
> Jason Bengtson, MLIS, MA
> Head of Library Computing and Information SystemsAssistant Professor,
> Graduate CollegeDepartment of Health Sciences Library and Information
> ManagementUniversity of Oklahoma Health Sciences Center405-271-2285, opt.
> 5405-271-3297 (fax)
> [log in to unmask]
> This e-mail is intended solely for the use of the individual to whom
> it is addressed and may contain information that is privileged,
> confidential or otherwise exempt from disclosure. If the reader of
> this e-mail is not the intended recipient or the employee or agent
> responsible for delivering the message to the intended recipient, you
> are hereby notified that any dissemination, distribution, or copying
> of this communication is strictly prohibited. If you have received
> this communication in error, please immediately notify us by replying
> to the original message at the listed email address. Thank You.